1
|
Jackson DJ, Cerveau N, Posnien N. De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide. Front Zool 2024; 21:17. [PMID: 38902827 PMCID: PMC11188175 DOI: 10.1186/s12983-024-00538-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
Many questions in biology benefit greatly from the use of a variety of model systems. High-throughput sequencing methods have been a triumph in the democratization of diverse model systems. They allow for the economical sequencing of an entire genome or transcriptome of interest, and with technical variations can even provide insight into genome organization and the expression and regulation of genes. The analysis and biological interpretation of such large datasets can present significant challenges that depend on the 'scientific status' of the model system. While high-quality genome and transcriptome references are readily available for well-established model systems, the establishment of such references for an emerging model system often requires extensive resources such as finances, expertise and computation capabilities. The de novo assembly of a transcriptome represents an excellent entry point for genetic and molecular studies in emerging model systems as it can efficiently assess gene content while also serving as a reference for differential gene expression studies. However, the process of de novo transcriptome assembly is non-trivial, and as a rule must be empirically optimized for every dataset. For the researcher working with an emerging model system, and with little to no experience with assembling and quantifying short-read data from the Illumina platform, these processes can be daunting. In this guide we outline the major challenges faced when establishing a reference transcriptome de novo and we provide advice on how to approach such an endeavor. We describe the major experimental and bioinformatic steps, provide some broad recommendations and cautions for the newcomer to de novo transcriptome assembly and differential gene expression analyses. Moreover, we provide an initial selection of tools that can assist in the journey from raw short-read data to assembled transcriptome and lists of differentially expressed genes.
Collapse
Affiliation(s)
- Daniel J Jackson
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany.
| | - Nicolas Cerveau
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany
| | - Nico Posnien
- University of Göttingen, Department of Developmental Biology, GZMB, Justus-Von-Liebig-Weg 11, Göttingen, 37077, Germany.
| |
Collapse
|
2
|
Juteršek M, Gerasymenko IM, Petek M, Haumann E, Vacas S, Kallam K, Gianoglio S, Navarro-Llopis V, Heethoff M, Fuertes IN, Patron N, Orzáez D, Gruden K, Warzecha H, Baebler Š. Transcriptome-informed identification and characterization of Planococcus citri cis- and trans-isoprenyl diphosphate synthase genes. iScience 2024; 27:109441. [PMID: 38523795 PMCID: PMC10960109 DOI: 10.1016/j.isci.2024.109441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 10/13/2023] [Accepted: 03/04/2024] [Indexed: 03/26/2024] Open
Abstract
Insect physiology and reproduction depend on several terpenoid compounds, whose biosynthesis is mainly unknown. One enigmatic group of insect monoterpenoids are mealybug sex pheromones, presumably resulting from the irregular coupling activity of unidentified isoprenyl diphosphate synthases (IDSs). Here, we performed a comprehensive search for IDS coding sequences of the pest mealybug Planococcus citri. We queried the available genomic and newly generated short- and long-read P. citri transcriptomic data and identified 18 putative IDS genes, whose phylogenetic analysis indicates several gene family expansion events. In vitro testing confirmed regular short-chain coupling activity with five gene products. With the candidate with highest IDS activity, we also detected low amounts of irregular coupling products, and determined amino acid residues important for chain-length preference and irregular coupling activity. This work therefore provides an important foundation for deciphering terpenoid biosynthesis in mealybugs, including the sex pheromone biosynthesis in P. citri.
Collapse
Affiliation(s)
- Mojca Juteršek
- National Institute of Biology, Department of Biotechnology and Systems Biology, Večna pot 111, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia
| | - Iryna M. Gerasymenko
- Plant Biotechnology and Metabolic Engineering, Department of Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
- Centre for Synthetic Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
| | - Marko Petek
- National Institute of Biology, Department of Biotechnology and Systems Biology, Večna pot 111, 1000 Ljubljana, Slovenia
| | - Elisabeth Haumann
- Plant Biotechnology and Metabolic Engineering, Department of Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
- Centre for Synthetic Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
| | - Sandra Vacas
- Instituto Agroforestal del Mediterráneo-CEQA, Universitat Politècnica de València, Camino de Vera s/n, Valencia, Spain
| | - Kalyani Kallam
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk NR4 7UZ, UK
| | - Silvia Gianoglio
- Institute for Plant Molecular and Cell Biology (IBMCP), Consejo Superior de Investigaciones Científicas (CSIC) - Universitat Politècnica de València (UPV), Valencia, Spain
| | - Vicente Navarro-Llopis
- Instituto Agroforestal del Mediterráneo-CEQA, Universitat Politècnica de València, Camino de Vera s/n, Valencia, Spain
| | - Michael Heethoff
- Animal Evolutionary Ecology, Department of Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
| | | | - Nicola Patron
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk NR4 7UZ, UK
| | - Diego Orzáez
- Institute for Plant Molecular and Cell Biology (IBMCP), Consejo Superior de Investigaciones Científicas (CSIC) - Universitat Politècnica de València (UPV), Valencia, Spain
| | - Kristina Gruden
- National Institute of Biology, Department of Biotechnology and Systems Biology, Večna pot 111, 1000 Ljubljana, Slovenia
| | - Heribert Warzecha
- Plant Biotechnology and Metabolic Engineering, Department of Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
- Centre for Synthetic Biology, Technical University of Darmstadt, Schnittspahnstrasse 4, 64287 Darmstadt, Germany
| | - Špela Baebler
- National Institute of Biology, Department of Biotechnology and Systems Biology, Večna pot 111, 1000 Ljubljana, Slovenia
| |
Collapse
|
3
|
Limonta G, Panti C, Fossi MC, Nardi F, Baini M. Exposure to virgin and marine incubated microparticles of biodegradable and conventional polymers modulates the hepatopancreas transcriptome of Mytilus galloprovincialis. JOURNAL OF HAZARDOUS MATERIALS 2024; 468:133819. [PMID: 38402680 DOI: 10.1016/j.jhazmat.2024.133819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/02/2024] [Accepted: 02/15/2024] [Indexed: 02/27/2024]
Abstract
Biodegradable polymers have been proposed as an alternative to conventional plastics to mitigate the impact of marine litter, but the research investigating their toxicity is still in its infancy. This study evaluates the potential ecotoxicological effects of both virgin and marine-incubated microparticles (MPs), at environmentally relevant concentration (0.1 mg/l), made of different biodegradable polymers (Polycaprolactone, Mater-Bi, cellulose) and conventional polymers (Polyethylene) on Mytilus galloprovincialis by using transcriptomics. This approach is increasingly being used to assess the effects of pollutants on organisms, obtaining data on numerous biological pathways simultaneously. Whole hepatopancreas de novo transcriptome sequencing was performed, individuating 972 genes differentially expressed across experimental groups compared to the control. Through the comparative transcriptomic profiling emerges that the preponderant effect is attributable to the marine incubation of MPs, especially for incubated polycaprolactone (731 DEGs). Mater-Bi and cellulose alter the smallest number of genes and biological processes in the mussel hepatopancreas. All microparticles, regardless of their polymeric composition, dysregulated innate immunity, and fatty acid metabolism biological processes. These findings highlight the necessity of considering the interactions of MPs with the environmental factors in the marine ecosystem when performing ecotoxicological evaluations. The results obtained contribute to fill current knowledge gaps regarding the potential environmental impacts of biodegradable polymers.
Collapse
Affiliation(s)
- Giacomo Limonta
- Department of Physical, Earth and Environmental Sciences, University of Siena, Via P.A. Mattioli, 4, Siena, Italy; National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Cristina Panti
- Department of Physical, Earth and Environmental Sciences, University of Siena, Via P.A. Mattioli, 4, Siena, Italy; National Biodiversity Future Center (NBFC), Palermo, Italy.
| | - Maria Cristina Fossi
- Department of Physical, Earth and Environmental Sciences, University of Siena, Via P.A. Mattioli, 4, Siena, Italy; National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Francesco Nardi
- National Biodiversity Future Center (NBFC), Palermo, Italy; Department of Life Sciences, University of Siena, Via A. Moro, 2, Siena, Italy
| | - Matteo Baini
- Department of Physical, Earth and Environmental Sciences, University of Siena, Via P.A. Mattioli, 4, Siena, Italy; National Biodiversity Future Center (NBFC), Palermo, Italy
| |
Collapse
|
4
|
Vences M, Anslan S, Sabino-Pinto J, Bonilla-Flores M, Echeverría-Galindo P, John U, Nass B, Pérez L, Preick M, Zhu L, Schwalb A. Dataset from RNAseq analysis of differential gene expression among developmental stages of two non-marine ostracodes. Data Brief 2024; 53:110070. [PMID: 38317728 PMCID: PMC10838692 DOI: 10.1016/j.dib.2024.110070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/12/2023] [Accepted: 01/11/2024] [Indexed: 02/07/2024] Open
Abstract
We contribute transcriptomic data for two species of Ostracoda, an early-diverged group of small-sized pancrustaceans. Data include new reference transcriptomes for two asexual non-marine species (Dolerocypris sinensis and Heterocypris aff. salina), as well as single-specimen transcriptomic data that served to analyse gene expression across four developmental stages in D. sinensis. Data are evaluated by computing gene expression profiles of the different developmental stages which consistently placed eggs and small larvae (at the stage of instar A-8) similar to each other, and apart from adults which were distinct from all other developmental stages but closest to large larvae (instar A-4). We further evaluated the transcriptomic data with two newly sequenced low-coverage genomes of the target species. The new data thus document the feasibility of obtaining reliable transcriptomic data from single specimens - even eggs - of these small metazoans.
Collapse
Affiliation(s)
- Miguel Vences
- Zoological Institute, Technische Universität Braunschweig, Mendelssohnstr. 4, 38106 Braunschweig, Germany
| | - Sten Anslan
- Institute of Ecology and Earth Sciences, University of Tartu, Juhan Liivi 2, 50409 Tartu, Estonia
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
| | - Joana Sabino-Pinto
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Nijenborgh 7, 9747 AG Groningen, the Netherlands
| | - Mauricio Bonilla-Flores
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Langer Kamp 19c, 38106 Braunschweig, Germany
| | - Paula Echeverría-Galindo
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Langer Kamp 19c, 38106 Braunschweig, Germany
| | - Uwe John
- Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung, Am Handelshafen 12, 27570 Bremerhaven, Germany
| | - Benneth Nass
- Zoological Institute, Technische Universität Braunschweig, Mendelssohnstr. 4, 38106 Braunschweig, Germany
| | - Liseth Pérez
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Langer Kamp 19c, 38106 Braunschweig, Germany
| | - Michaela Preick
- Faculty of Mathematics and Natural Sciences, Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Liping Zhu
- Institute of Tibetan Plateau Research, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China
| | - Antje Schwalb
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Langer Kamp 19c, 38106 Braunschweig, Germany
| |
Collapse
|
5
|
Ghazal A, Clarke D, Abdel-Rahman MA, Ribeiro A, Collie-Duguid E, Pattinson C, Burgoyne K, Muhammad T, Alfadhel S, Heidari Z, Samir R, Gerges MM, Nkene I, Colamarino RA, Hijazi K, Houssen WE. Venomous gland transcriptome and venom proteomic analysis of the scorpion Androctonus amoreuxi reveal new peptides with anti-SARS-CoV-2 activity. Peptides 2024; 173:171139. [PMID: 38142817 DOI: 10.1016/j.peptides.2023.171139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The recent COVID-19 pandemic shows the critical need for novel broad spectrum antiviral agents. Scorpion venoms are known to contain highly bioactive peptides, several of which have demonstrated strong antiviral activity against a range of viruses. We have generated the first annotated reference transcriptome for the Androctonus amoreuxi venom gland and used high performance liquid chromatography, transcriptome mining, circular dichroism and mass spectrometric analysis to purify and characterize twelve previously undescribed venom peptides. Selected peptides were tested for binding to the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein and inhibition of the spike RBD - human angiotensin-converting enzyme 2 (hACE2) interaction using surface plasmon resonance-based assays. Seven peptides showed dose-dependent inhibitory effects, albeit with IC50 in the high micromolar range (117-1202 µM). The most active peptide was synthesized using solid phase peptide synthesis and tested for its antiviral activity against SARS-CoV-2 (Lineage B.1.1.7). On exposure to the synthetic peptide of a human lung cell line infected with replication-competent SARS-CoV-2, we observed an IC50 of 200 nM, which was nearly 600-fold lower than that observed in the RBD - hACE2 binding inhibition assay. Our results show that scorpion venom peptides can inhibit the SARS-CoV-2 replication although unlikely through inhibition of spike RBD - hACE2 interaction as the primary mode of action. Scorpion venom peptides represent excellent scaffolds for design of novel anti-SARS-CoV-2 constrained peptides. Future studies should fully explore their antiviral mode of action as well as the structural dynamics of inhibition of target virus-host interactions.
Collapse
Affiliation(s)
- Ahmad Ghazal
- Institute of Medical Sciences, University of Aberdeen, Aberdeen AB25 2ZD, UK; Department of Chemistry, University of Aberdeen, Aberdeen AB24 3UE, UK
| | - David Clarke
- School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, UK
| | | | - Antonio Ribeiro
- Centre for Genome-Enabled Biology and Medicine, University of Aberdeen, Aberdeen AB24 3RY, UK
| | - Elaina Collie-Duguid
- Centre for Genome-Enabled Biology and Medicine, University of Aberdeen, Aberdeen AB24 3RY, UK
| | - Craig Pattinson
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Kate Burgoyne
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Taj Muhammad
- Pharmacognosy, Department of Pharmaceutical Biosciences, Uppsala University, Biomedical Centre, Box 591 SE-75124 Uppsala, Sweden
| | - Sanad Alfadhel
- Institute of Medical Sciences, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Zeynab Heidari
- Centre for Genome-Enabled Biology and Medicine, University of Aberdeen, Aberdeen AB24 3RY, UK
| | - Reham Samir
- Zoology Department, Faculty of Science, Suez Canal University, Ismailia 41522, Egypt
| | - Mariam M Gerges
- Zoology Department, Faculty of Science, Suez Canal University, Ismailia 41522, Egypt
| | - Istifanus Nkene
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Rosa A Colamarino
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Karolin Hijazi
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen AB25 2ZD, UK
| | - Wael E Houssen
- Institute of Medical Sciences, University of Aberdeen, Aberdeen AB25 2ZD, UK; Department of Chemistry, University of Aberdeen, Aberdeen AB24 3UE, UK.
| |
Collapse
|
6
|
Fonseca-González I, Velasquez-Agudelo E, Londoño-Mesa MH, Álvarez JC. De novo transcriptome sequencing and annotation of the Antarctic polychaete Microspio moorei (Spionidae) with its characterization of the heat stress-related proteins (HSP, SOD & CAT). Mar Genomics 2024; 73:101085. [PMID: 38301367 DOI: 10.1016/j.margen.2024.101085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/07/2023] [Accepted: 01/22/2024] [Indexed: 02/03/2024]
Abstract
We present a de novo transcriptome assembly for the non-model Antarctic polychaete worm Microspio moorei (Spionidae) collected during Antarctic field expedition in Fildes Bay, King George Island, Antarctic Peninsula, in 2017. Here, we report the first transcriptome reference array for Microspio spp. The gene sequences of the spionid worm were annotated from a wide range of functions (i.e., biological, and metabolic processes, catalytic processes, and catalytic activity). HSP70, HSP90 SOD and CAT families were compared to reported annelid transcriptomes and proteomes. The phylogenetic analysis using COI, 16S, and 18S markers effectively clusters the species within the family. However, it also casts uncertainty on the monophyletic nature of the Microspio genera, indicating the necessity for additional data and potentially requiring a reevaluation of its grouping. Within these protein families, 3D model software was used to create one representative of their protein structures. Structural predictions were compared with related reported annelids living at different temperatures and a human X-ray reference. We found structural differences (RMSE >1.8) between the human HSP proteins but no significant differences between the polychaete-predicted proteins (RMSE <1.2). These results encourage further research of heat stress-related proteins, the development of genetic markers for climate change-induced temperature stress, and the study of the underlying mechanisms of the heat response. Moreover, these results motivate the extension of these findings to congeneric species.
Collapse
Affiliation(s)
- Idalyd Fonseca-González
- LimnoBasE & Biotamar Research Group, Institute of Biology, University of Antioquia, Medellín 050010, Colombia
| | - Esteban Velasquez-Agudelo
- Research Group in Biodiversity, Evolution and Conservation (BEC), EAFIT University, Medellín 050022, Colombia
| | - Mario H Londoño-Mesa
- LimnoBasE & Biotamar Research Group, Institute of Biology, University of Antioquia, Medellín 050010, Colombia
| | - Javier C Álvarez
- Research Group in Biodiversity, Evolution and Conservation (BEC), EAFIT University, Medellín 050022, Colombia.
| |
Collapse
|
7
|
Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024; 25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open
Abstract
The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA.
| |
Collapse
|
8
|
Shabbir M, Mithani A. Roast: a tool for reference-free optimization of supertranscriptome assemblies. BMC Bioinformatics 2024; 25:2. [PMID: 38166712 PMCID: PMC10763045 DOI: 10.1186/s12859-023-05614-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms. RESULTS We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors. CONCLUSION ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.
Collapse
Affiliation(s)
- Madiha Shabbir
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan
| | - Aziz Mithani
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.
| |
Collapse
|
9
|
Šimková A, Civáňová Křížová K, Voříšková K, Vetešník L, Bystrý V, Demko M. Transcriptome Profile Analyses of Head Kidney in Roach ( Rutilus rutilus), Common Bream ( Abramis brama) and Their Hybrids: Does Infection by Monogenean Parasites in Freshwater Fish Reveal Differences in Fish Vigour among Parental Species and Their Hybrids? BIOLOGY 2023; 12:1199. [PMID: 37759598 PMCID: PMC10525477 DOI: 10.3390/biology12091199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 08/23/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023]
Abstract
Hybrid generations usually face either a heterosis advantage or a breakdown, that can be expressed by the level of parasite infection in hybrid hosts. Hybrids are less infected by parasites than parental species (especially F1 generations) or more infected than parental species (especially post-F1 generations). We performed the experiment with blood-feeding gill parasite Paradiplozoon homoion (Monogenea) infecting leuciscid species, Abramis brama and Rutilus rutilus, their F1 generation and two backcross generations. Backcross generations tended to be more parasitized than parental lines and the F1 generation. The number of differentially expressed genes (DEGs) was lower in F1 hybrids and higher in backcross hybrids when compared to each of the parental lines. The main groups of DEGs were shared among lines; however, A. brama and R. rutilus differed in some of the top gene ontology (GO) terms. DEG analyses revealed the role of heme binding and erythrocyte differentiation after infection by blood-feeding P. homoion. Two backcross generations shared some of the top GO terms, representing mostly downregulated genes associated with P. homoion infection. KEGG analysis revealed the importance of disease-associated pathways; the majority of them were shared by two backcross generations. Our study revealed the most pronounced DEGs associated with blood-feeding monogeneans in backcross hybrids, potentially (but not exclusively) explainable by hybrid breakdown. The lower DEGs reported in F1 hybrids being less parasitized than backcross hybrids is in line with the hybrid advantage.
Collapse
Affiliation(s)
- Andrea Šimková
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic; (K.C.K.); (K.V.)
| | - Kristína Civáňová Křížová
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic; (K.C.K.); (K.V.)
| | - Kristýna Voříšková
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic; (K.C.K.); (K.V.)
| | - Lukáš Vetešník
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic; (K.C.K.); (K.V.)
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Květná 8, 603 65 Brno, Czech Republic; (L.V.)
| | - Vojtěch Bystrý
- Central European Institute of Technology, Masaryk University, 625 00 Brno, Czech Republic; (V.B.); (M.D.)
| | - Martin Demko
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic; (K.C.K.); (K.V.)
- Central European Institute of Technology, Masaryk University, 625 00 Brno, Czech Republic; (V.B.); (M.D.)
| |
Collapse
|
10
|
Ahmadi H, Sheikh-Assadi M, Fatahi R, Zamani Z, Shokrpour M. Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis. Sci Rep 2023; 13:12415. [PMID: 37524806 PMCID: PMC10390528 DOI: 10.1038/s41598-023-39620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/27/2023] [Indexed: 08/02/2023] Open
Abstract
Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis.
Collapse
Affiliation(s)
- Hosein Ahmadi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Morteza Sheikh-Assadi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Reza Fatahi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran.
| | - Zabihollah Zamani
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Majid Shokrpour
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| |
Collapse
|
11
|
Batista da Silva I, Aciole Barbosa D, Kavalco KF, Nunes LR, Pasa R, Menegidio FB. Discovery of putative long non-coding RNAs expressed in the eyes of Astyanax mexicanus (Actinopterygii: Characidae). Sci Rep 2023; 13:12051. [PMID: 37491348 PMCID: PMC10368750 DOI: 10.1038/s41598-023-34198-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/25/2023] [Indexed: 07/27/2023] Open
Abstract
Astyanax mexicanus is a well-known model species, that has two morphotypes, cavefish, from subterranean rivers and surface fish, from surface rivers. They are morphologically distinct due to many troglomorphic traits in the cavefish, such as the absence of eyes. Most studies on A. mexicanus are focused on eye development and protein-coding genes involved in the process. However, lncRNAs did not get the same attention and very little is known about them. This study aimed to fill this knowledge gap, identifying, describing, classifying, and annotating lncRNAs expressed in the embryo's eye tissue of cavefish and surface fish. To do so, we constructed a concise workflow to assemble and evaluate transcriptomes, annotate protein-coding genes, ncRNAs families, predict the coding potential, identify putative lncRNAs, map them and predict interactions. This approach resulted in the identification of 33,069 and 19,493 putative lncRNAs respectively mapped in cavefish and surface fish. Thousands of these lncRNAs were annotated and identified as conserved in human and several species of fish. Hundreds of them were validated in silico, through ESTs. We identified lncRNAs associated with genes related to eye development. This is the case of a few lncRNAs associated with sox2, which we suggest being isomorphs of the SOX2-OT, a lncRNA that can regulate the expression of sox2. This work is one of the first studies to focus on the description of lncRNAs in A. mexicanus, highlighting several lncRNA targets and opening an important precedent for future studies focusing on lncRNAs expressed in A. mexicanus.
Collapse
Affiliation(s)
- Iuri Batista da Silva
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, MG, 31270-901, Brazil
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil
| | - David Aciole Barbosa
- Integrated Biotechnology Center, University of Mogi das Cruzes (UMC), Av. Dr. Cândido X. de Almeida and Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | - Karine Frehner Kavalco
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil
| | - Luiz R Nunes
- Center for Natural and Human Sciences, Federal University of ABC, São Bernardo do Campo, SP, 09606-045, Brazil
| | - Rubens Pasa
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil.
| | - Fabiano B Menegidio
- Integrated Biotechnology Center, University of Mogi das Cruzes (UMC), Av. Dr. Cândido X. de Almeida and Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil.
| |
Collapse
|
12
|
Suresh S, Mirasole A, Ravasi T, Vizzini S, Schunter C. Brain transcriptome of gobies inhabiting natural CO 2 seeps reveal acclimation strategies to long-term acidification. Evol Appl 2023; 16:1345-1358. [PMID: 37492147 PMCID: PMC10363848 DOI: 10.1111/eva.13574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 06/14/2023] [Accepted: 06/15/2023] [Indexed: 07/27/2023] Open
Abstract
Ocean acidification (OA) is known to affect the physiology, survival, behaviour and fitness of various fish species with repercussions at the population, community and ecosystem levels. Some fish species, however, seem to acclimate rapidly to OA conditions and even thrive in acidified environments. The molecular mechanisms that enable species to successfully inhabit high CO2 environments have not been fully elucidated especially in wild fish populations. Here, we used the natural CO2 seep in Vulcano Island, Italy to study the effects of elevated CO2 exposure on the brain transcriptome of the anemone goby, a species with high population density in the CO2 seep and investigate their potential for acclimation. Compared to fish from environments with ambient CO2, gobies living in the CO2 seep showed differences in the expression of transcripts involved in ion transport and pH homeostasis, cellular stress, immune response, circadian rhythm and metabolism. We also found evidence of potential adaptive mechanisms to restore the functioning of GABAergic pathways, whose activity can be affected by exposure to elevated CO2 levels. Our findings indicate that gobies living in the CO2 seep may be capable of mitigating CO2-induced oxidative stress and maintaining physiological pH while meeting the consequent increased energetic costs. The conspicuous difference in the expression of core circadian rhythm transcripts could provide an adaptive advantage by increasing the flexibility of physiological processes in elevated CO2 conditions thereby facilitating acclimation. Our results show potential molecular processes of acclimation to elevated CO2 in gobies enabling them to thrive in the acidified waters of Vulcano Island.
Collapse
Affiliation(s)
- Sneha Suresh
- Swire Institute of Marine Science, School of Biological SciencesThe University of Hong KongHong Kong SARChina
| | - Alice Mirasole
- Department of Integrative Marine EcologyIschia Marine Centre, Stazione Zoologica Anton DohrnNaplesItaly
| | - Timothy Ravasi
- Marine Climate Change UnitOkinawa Institute of Science and Technology Graduate UniversityOnna‐sonJapan
- Australian Research Council Centre of Excellence for Coral Reef StudiesJames Cook UniversityTownsvilleQueenslandAustralia
| | - Salvatrice Vizzini
- Department of Earth and Marine SciencesUniversity of PalermoPalermoItaly
- CoNISMa, National Inter‐University Consortium for Marine ScienceRomaItaly
| | - Celia Schunter
- Swire Institute of Marine Science, School of Biological SciencesThe University of Hong KongHong Kong SARChina
- State Key Laboratory of Marine PollutionCity University of Hong KongHong Kong SARChina
| |
Collapse
|
13
|
Bendele KG, Guerrero FD, Lohmeyer KH, Foil LD, Metz RP, Johnson CD. Horn fly transcriptome data of ten populations from the southern United States with varying degrees and molecular mechanisms of pesticide resistance. Data Brief 2023; 48:109272. [PMID: 37363058 PMCID: PMC10285531 DOI: 10.1016/j.dib.2023.109272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 05/13/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open
Abstract
Haematobia irritans irritans (Linnaeus, 1758: Diptera: Muscidae), the horn fly, is an external parasite of penned and pastured livestock that causes a major economic impact on cattle production worldwide. Pesticides such as synthetic pyrethroids and organophosphates are routinely used to control horn flies; however, resistance to these chemicals has become a concern in several countries. To further elucidate the molecular mechanisms of resistance in horn fly populations, we sequenced the transcriptomes of ten populations of horn flies from the southern US possessing varying degrees of pesticide resistance levels to pyrethroids, organophosphates, and endosulfans. We employed an Illumina paired end HiSeq approach, followed by de novo assembly of the transcriptomes using CLC Genomics Workbench 8.0.1 De Novo Assembler using multiple kmers, and annotation using Blast2GO PRO version 5.2.5. The Gene Ontology biological process term Response to Insecticide was found in all the populations, but at an increased frequency in the populations with higher levels of insecticide resistance. The raw sequence reads are archived in the Sequence Read Archive (SRA) and assembled population transcriptomes in the Transcriptome Shotgun Assembly (TSA) at the National Center for Biotechnology Information (NCBI).
Collapse
Affiliation(s)
- Kylie G. Bendele
- USDA-ARS Knipling-Bushland US Livestock Insects Research Laboratory, 2700 Fredericksburg Rd., Kerrville, TX 78028, USA
| | - Felix D. Guerrero
- USDA-ARS Knipling-Bushland US Livestock Insects Research Laboratory, 2700 Fredericksburg Rd., Kerrville, TX 78028, USA
- Massey University, School of Veterinary Science, Genetics and Molecular Biology, Private Bag 11 222, Palmerston North, Manawatu-Wanganui 4442, New Zealand
| | - Kimberly H. Lohmeyer
- USDA-ARS Knipling-Bushland US Livestock Insects Research Laboratory, 2700 Fredericksburg Rd., Kerrville, TX 78028, USA
| | - Lane D. Foil
- Department of Entomology, Louisiana State University Agriculture Center, Baton Rouge, LA 70803, USA
| | - Richard P. Metz
- Genomics and Bioinformatics Service, Texas A&M AgriLife Research, 1500 Research Parkway, Room 250, Centeq Building A, College Station, TX 77845, USA
| | - Charles D. Johnson
- Genomics and Bioinformatics Service, Texas A&M AgriLife Research, 1500 Research Parkway, Room 250, Centeq Building A, College Station, TX 77845, USA
| |
Collapse
|
14
|
Krinos AI, Cohen NR, Follows MJ, Alexander H. Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly. BMC Bioinformatics 2023; 24:74. [PMID: 36869298 PMCID: PMC9983209 DOI: 10.1186/s12859-022-05121-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 12/21/2022] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Diverse communities of microbial eukaryotes in the global ocean provide a variety of essential ecosystem services, from primary production and carbon flow through trophic transfer to cooperation via symbioses. Increasingly, these communities are being understood through the lens of omics tools, which enable high-throughput processing of diverse communities. Metatranscriptomics offers an understanding of near real-time gene expression in microbial eukaryotic communities, providing a window into community metabolic activity. RESULTS Here we present a workflow for eukaryotic metatranscriptome assembly, and validate the ability of the pipeline to recapitulate real and manufactured eukaryotic community-level expression data. We also include an open-source tool for simulating environmental metatranscriptomes for testing and validation purposes. We reanalyze previously published metatranscriptomic datasets using our metatranscriptome analysis approach. CONCLUSION We determined that a multi-assembler approach improves eukaryotic metatranscriptome assembly based on recapitulated taxonomic and functional annotations from an in-silico mock community. The systematic validation of metatranscriptome assembly and annotation methods provided here is a necessary step to assess the fidelity of our community composition measurements and functional content assignments from eukaryotic metatranscriptomes.
Collapse
Affiliation(s)
- Arianna I Krinos
- MIT-WHOI Joint Program in Oceanography and Applied Ocean Science and Engineering, Cambridge and Woods Hole, MA, USA. .,Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA. .,Department of Earth, Atmospheric, and Planetary Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Natalie R Cohen
- Skidaway Institute of Oceanography, University of Georgia, Savannah, GA, USA
| | - Michael J Follows
- Department of Earth, Atmospheric, and Planetary Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Harriet Alexander
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA.
| |
Collapse
|
15
|
Full-length transcriptome from different life stages of cobia (Rachycentron canadum, Rachycentridae). Sci Data 2023; 10:97. [PMID: 36797271 PMCID: PMC9935508 DOI: 10.1038/s41597-022-01907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 12/14/2022] [Indexed: 02/18/2023] Open
Abstract
Cobia (Rachycentron canadum, Rachycentridae) is one of the prospective species for mariculture. The transcriptome-based study on cobia was hampered by an inadequate reference genome and a lack of full-length cDNAs. We used a long-read based sequencing technology (PacBio Sequel II Iso-Seq3 SMRT) to obtain complete transcriptome sequences from larvae, juveniles, and various tissues of adult cobia, and a single SMRTcell generated 99 gigabytes of data and 51,205,946,694 bases. A total of 8609435, 7441673 and 9140164 subreads were generated from the larval, juvenile, and adult sample pools, with mean sub-read lengths of 2109.9, 1988.2 and 1996.2 bp, respectively. All samples were combined to increase transcript recovery and clustered into 35661 high-quality reads. This is the first report on a full-length transcriptome from R. canadum. Our results illustrate a significant increase in the identified amount of cobia LncRNAs and alternatively spliced transcripts, which will help improve genome annotation. Furthermore, this information will be beneficial for nutrigenomics and functional studies on cobia and other commercially important mariculture species.
Collapse
|
16
|
Velasco VME, Ferreira A, Zaman S, Noordermeer D, Ensminger I, Wegrzyn JL. A long-read and short-read transcriptomics approach provides the first high-quality reference transcriptome and genome annotation for Pseudotsuga menziesii (Douglas-fir). G3 (BETHESDA, MD.) 2023; 13:jkac304. [PMID: 36454025 PMCID: PMC10468028 DOI: 10.1093/g3journal/jkac304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 12/13/2021] [Accepted: 10/19/2022] [Indexed: 12/02/2022]
Abstract
Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more "complete" genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation.
Collapse
Affiliation(s)
| | - Alyssa Ferreira
- Department of Evolution and Ecology, University of
Connecticut, Storrs, CT 06269, USA
| | - Sumaira Zaman
- Department of Evolution and Ecology, University of
Connecticut, Storrs, CT 06269, USA
| | - Devin Noordermeer
- Department of Biology, University of Toronto,
Mississauga, ON L5L 1C8, Canada
- Graduate Department of Cell and Systems Biology, University of
Toronto, Toronto, ON M5S, Canada
| | - Ingo Ensminger
- Department of Biology, University of Toronto,
Mississauga, ON L5L 1C8, Canada
- Graduate Department of Cell and Systems Biology, University of
Toronto, Toronto, ON M5S, Canada
- Graduate Department of Ecology and Evolutionary Biology, University of
Toronto, Toronto, ON M5S, Canada
| | - Jill L Wegrzyn
- Department of Evolution and Ecology, University of
Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
17
|
Tao F, Fan C, Liu Y, Sivakumar S, Kowalski KP, Golenberg EM. Optimization and application of non-native Phragmites australis transcriptome assemblies. PLoS One 2023; 18:e0280354. [PMID: 36689482 PMCID: PMC9870158 DOI: 10.1371/journal.pone.0280354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/27/2022] [Indexed: 01/24/2023] Open
Abstract
Phragmites australis (common reed) has a cosmopolitan distribution and has been suggested as a model organism for the study of invasive plant species. In North America, the non-native subspecies (ssp. australis) is widely distributed across the contiguous 48 states in the United States and large parts of Canada. Even though millions of dollars are spent annually on Phragmites management, insufficient knowledge of P. australis impeded the efficiency of management. To solve this problem, transcriptomic information generated from multiple types of tissue could be a valuable resource for future studies. Here, we constructed forty-nine P. australis transcriptomes assemblies via different assembly tools and multiple parameter settings. The optimal transcriptome assembly for functional annotation and downstream analyses was selected among these transcriptome assemblies by comprehensive assessments. For a total of 422,589 transcripts assembled in this transcriptome assembly, 319,046 transcripts (75.5%) have at least one functional annotation. Within the transcriptome assembly, we further identified 1,495 transcripts showing tissue-specific expression pattern, 10,828 putative transcription factors, and 72,165 candidates for simple sequence repeats markers. The identification and analyses of predicted transcripts related to herbicide- and salinity-resistant genes were shown as two applications of the transcriptomic information to facilitate further research on P. australis. Transcriptome assembly and selection would be important for the transcriptome annotation. With this optimal transcriptome assembly and all relative information from downstream analyses, we have helped to establish foundations for future studies on the mechanisms underlying the invasiveness of non-native P. australis subspecies.
Collapse
Affiliation(s)
- Feng Tao
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Chuanzhu Fan
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Yimin Liu
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Subashini Sivakumar
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Kurt P. Kowalski
- U.S. Geological Survey-Great Lakes Science Center, Ann Arbor, MI, United States of America
| | - Edward M. Golenberg
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| |
Collapse
|
18
|
Farkas C, Recabal A, Mella A, Candia-Herrera D, Olivero MG, Haigh JJ, Tarifeño-Saldivia E, Caprile T. annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing. Gigascience 2022; 11:6874526. [PMID: 36472574 PMCID: PMC9724561 DOI: 10.1093/gigascience/giac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/22/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes.
Collapse
Affiliation(s)
| | - Antonia Recabal
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Andy Mella
- Instituto de Ciencias Naturales, Universidad de las Américas, Chile,Centro Integrativo de Biología y Química Aplicada (CIBQA), Universidad Bernardo O'Higgins, Santiago 8370854, Chile
| | - Daniel Candia-Herrera
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Maryori González Olivero
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Jody Jonathan Haigh
- CancerCare Manitoba Research Institute, Winnipeg, MB, Canada,Department of Pharmacology and Therapeutics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | | | | |
Collapse
|
19
|
Kretzmer C, Narasimhan RL, Lal RD, Balassi V, Ravellette J, Kotekar Manjunath AK, Koshy JJ, Viano M, Torre S, Zanda VM, Kumravat M, Saldanha KMR, Chandranpillai H, Nihad I, Zhong F, Sun Y, Gustin J, Borgschulte T, Liu J, Razafsky D. De novo assembly and annotation of the CHOZN® GS -/- genome supports high-throughput genome-scale screening. Biotechnol Bioeng 2022; 119:3632-3646. [PMID: 36073082 PMCID: PMC9825924 DOI: 10.1002/bit.28226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 07/20/2022] [Accepted: 08/28/2022] [Indexed: 01/11/2023]
Abstract
Chinese hamster ovary (CHO) cells have been used as the industry standard for the production of therapeutic monoclonal antibodies for several decades. Despite significant improvements in commercial-scale production processes and media, the CHO cell has remained largely unchanged. Due to the cost and complexity of whole-genome sequencing and gene-editing it has been difficult to obtain the tools necessary to improve the CHO cell line. With the advent of next-generation sequencing and the discovery of the CRISPR/Cas9 system it has become more cost effective to sequence and manipulate the CHO genome. Here, we provide a comprehensive de novo assembly and annotation of the CHO-K1 based CHOZN® GS-/- genome. Using this platform, we designed, built, and confirmed the functionality of a whole genome CRISPR guide RNA library that will allow the bioprocessing community to design a more robust CHO cell line leading to the production of life saving medications in a more cost-effective manner.
Collapse
Affiliation(s)
- Corey Kretzmer
- Upstream Research and Development, MilliporeSigmaSt. LouisMissouriUSA
| | - Rajagopalan Lakshmi Narasimhan
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Rahul Deva Lal
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Vincent Balassi
- Upstream Research and Development, MilliporeSigmaSt. LouisMissouriUSA
| | - James Ravellette
- Upstream Research and Development, MilliporeSigmaSt. LouisMissouriUSA
| | - Ajaya Kumar Kotekar Manjunath
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Jesvin Joy Koshy
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Marta Viano
- Istituto di Ricerche Biomediche “A. Marxer” RBM S.p.A.IvreaItaly
| | - Serena Torre
- Istituto di Ricerche Biomediche “A. Marxer” RBM S.p.A.IvreaItaly
| | - Valeria M. Zanda
- Istituto di Ricerche Biomediche “A. Marxer” RBM S.p.A.IvreaItaly
| | - Mausam Kumravat
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Keith Metelo Raul Saldanha
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Harikrishnan Chandranpillai
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Ifra Nihad
- Bioinformatics, IT R&D Applications, Merck (Sigma‐Aldrich Chemicals Pvt. Ltd., A subsidiary of Merck KGaA, Darmstadt, Germany)BangaloreIndia
| | - Fei Zhong
- Life Science Bioinformatics, IT, MilliporeSigmaSt. LouisMissouriUSA
| | - Yi Sun
- Bioinformatics, IT R&D Applications, MilliporeSigmaSt. LouisMissouriUSA
| | - Jason Gustin
- Upstream Research and Development, MilliporeSigmaSt. LouisMissouriUSA
| | | | - Jiajian Liu
- Life Science Bioinformatics, IT, MilliporeSigmaSt. LouisMissouriUSA
| | - David Razafsky
- Upstream Research and Development, MilliporeSigmaSt. LouisMissouriUSA
| |
Collapse
|
20
|
Shafranskaya D, Kale V, Finn R, Lapidus AL, Korobeynikov A, Prjibelski AD. MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data. Front Microbiol 2022; 13:981458. [PMID: 36386613 PMCID: PMC9651917 DOI: 10.3389/fmicb.2022.981458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/29/2022] [Indexed: 11/25/2022] Open
Abstract
While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available from https://github.com/ablab/metaGT.
Collapse
Affiliation(s)
- Daria Shafranskaya
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Varsha Kale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Rob Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Alla L. Lapidus
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Andrey D. Prjibelski
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- *Correspondence: Andrey D. Prjibelski,
| |
Collapse
|
21
|
Identification of Candidate Chemosensory Gene Families by Head Transcriptomes Analysis in the Mexican Fruit Fly, Anastrepha ludens Loew (Diptera: Tephritidae). Int J Mol Sci 2022; 23:ijms231810531. [PMID: 36142444 PMCID: PMC9500802 DOI: 10.3390/ijms231810531] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 08/31/2022] [Accepted: 09/06/2022] [Indexed: 11/16/2022] Open
Abstract
Insect chemosensory systems, such as smell and taste, are mediated by chemosensory receptor and non-receptor protein families. In the last decade, many studies have focused on discovering these families in Tephritidae species of agricultural importance. However, to date, there is no information on the Mexican fruit fly Anastrepha ludens Loew, a priority pest of quarantine importance in Mexico and other countries. This work represents the first effort to identify, classify and characterize the six chemosensory gene families by analyzing two head transcriptomes of sexually immature and mature adults of A. ludens from laboratory-reared and wild populations, respectively. We identified 120 chemosensory genes encoding 31 Odorant-Binding Proteins (OBPs), 5 Chemosensory Proteins (CSPs), 2 Sensory Neuron Membrane Proteins (SNMPs), 42 Odorant Receptors (ORs), 17 Ionotropic Receptors (IRs), and 23 Gustatory Receptors (GRs). The 120 described chemosensory proteins of the Mexican fruit fly significantly contribute to the genetic databases of insects, particularly dipterans. Except for some OBPs, this work reports for the first time the repertoire of olfactory proteins for one species of the genus Anastrepha, which provides a further basis for studying the olfactory system in the family Tephritidae, one of the most important for its economic and social impact worldwide.
Collapse
|
22
|
Sheikh-Assadi M, Naderi R, Salami SA, Kafi M, Fatahi R, Shariati V, Martinelli F, Cicatelli A, Triassi M, Guarino F, Improta G, Claros MG. Normalized Workflow to Optimize Hybrid De Novo Transcriptome Assembly for Non-Model Species: A Case Study in Lilium ledebourii (Baker) Boiss. PLANTS 2022; 11:plants11182365. [PMID: 36145766 PMCID: PMC9503428 DOI: 10.3390/plants11182365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 08/21/2022] [Accepted: 09/07/2022] [Indexed: 11/16/2022]
Abstract
A high-quality transcriptome is required to advance numerous bioinformatics workflows. Nevertheless, the effectuality of tools for de novo assembly and real precision assembled transcriptomes looks somewhat unexplored, particularly for non-model organisms with complicated (very long, heterozygous, polyploid) genomes. To disclose the performance of various transcriptome assembly programs, this study built 11 single assemblies and analyzed their performance on some significant reference-free and reference-based criteria. As well as to reconfirm the outputs of benchmarks, 55 BLAST were performed and compared using 11 constructed transcriptomes. Concisely, normalized benchmarking demonstrated that Velvet–Oases suffer from the worst results, while the EvidentialGene strategy can provide the most comprehensive and accurate transcriptome of Lilium ledebourii (Baker) Boiss. The BLAST results also confirmed the superiority of EvidentialGene, so it could capture even up to 59% more (than Velvet–Oases) unique gene hits. To promote assembly optimization, with the help of normalized benchmarking, PCA and AHC, it is emphasized that each metric can only provide part of the transcriptome status, and one should never settle for just a few evaluation criteria. This study supplies a framework for benchmarking and optimizing the efficiency of assembly approaches to analyze RNA-Seq data and reveals that selecting an inefficient assembly strategy might result in less identification of unique gene hits.
Collapse
Affiliation(s)
- Morteza Sheikh-Assadi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj 31587-77871, Iran
- Correspondence: (M.S.-A.); (R.N.)
| | - Roohangiz Naderi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj 31587-77871, Iran
- Correspondence: (M.S.-A.); (R.N.)
| | - Seyed Alireza Salami
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj 31587-77871, Iran
| | - Mohsen Kafi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj 31587-77871, Iran
| | - Reza Fatahi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj 31587-77871, Iran
| | - Vahid Shariati
- NIGEB Genome Center, National Institute of Genetic Engineering and Biotechnology, Tehran 14965/161, Iran
| | - Federico Martinelli
- Department of Biology, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Angela Cicatelli
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, 84084 Fisciano, Italy
| | - Maria Triassi
- Department of Public Health, University of Naples “Federico II”, 80131 Naples, Italy
| | - Francesco Guarino
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, 84084 Fisciano, Italy
| | - Giovanni Improta
- Department of Public Health, University of Naples “Federico II”, 80131 Naples, Italy
| | - Manuel Gonzalo Claros
- Molecular Biology and Biochemistry Department, University of Málaga, 29071 Málaga, Spain
- CIBER de Enfermedades Raras (CIBERER), 29071 Málaga, Spain
- Institute of Biomedical Research in Málaga (IBIMA), IBIMA-RARE, 29010 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea (IHSM-UMA-CSIC), 29010 Málaga, Spain
| |
Collapse
|
23
|
Abstract
The platyrrhine family Cebidae (capuchin and squirrel monkeys) exhibit among the largest primate encephalization quotients. Each cebid lineage is also characterized by notable lineage-specific traits, with capuchins showing striking similarities to Hominidae such as high sensorimotor intelligence with tool use, advanced cognitive abilities, and behavioral flexibility. Here, we take a comparative genomics approach, performing genome-wide tests for positive selection across five cebid branches, to gain insight into major periods of cebid adaptive evolution. We uncover candidate targets of selection across cebid evolutionary history that may underlie the emergence of lineage-specific traits. Our analyses highlight shifting and sustained selective pressures on genes related to brain development, longevity, reproduction, and morphology, including evidence for cumulative and diversifying neurobiological adaptations across cebid evolution. In addition to generating a high-quality reference genome assembly for robust capuchins, our results lend to a better understanding of the adaptive diversification of this distinctive primate clade.
Collapse
|
24
|
Proteotranscriptomics - A facilitator in omics research. Comput Struct Biotechnol J 2022; 20:3667-3675. [PMID: 35891789 PMCID: PMC9293588 DOI: 10.1016/j.csbj.2022.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/04/2022] [Accepted: 07/04/2022] [Indexed: 11/26/2022] Open
Abstract
Applications in omics research, such as comparative transcriptomics and proteomics, require the knowledge of the species-specific gene sequence and benefit from a comprehensive high-quality annotation of the coding genes to achieve high coverage. While protein-coding genes can in simple cases be detected by scanning the genome for open reading frames, in more complex genomes exonic sequences are separated by introns. Despite advances in sequencing technologies that allow for ever-growing numbers of genomes, the quality of many of the provided genome assemblies do not reach reference quality. These non-contiguous assemblies with gaps and the necessity to predict splice sites limit accurate gene annotation from solely genomic data. In contrast, the transcriptome only contains transcribed gene regions, is devoid of introns and thus provides the optimal basis for the identification of open reading frames. The additional integration of proteomics data to validate predicted protein-coding genes further enriches for accurate gene models. This review outlines the principles of the proteotranscriptomics approach, discusses common challenges and suggests methods for improvement.
Collapse
|
25
|
Analysis of Transcriptome Difference between Blood-Fed and Starved Tropical Bed Bug, Cimex hemipterus (F.) (Hemiptera: Cimicidae). INSECTS 2022; 13:insects13040387. [PMID: 35447830 PMCID: PMC9029146 DOI: 10.3390/insects13040387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 04/01/2022] [Accepted: 04/12/2022] [Indexed: 11/19/2022]
Abstract
Simple Summary Bed bugs are well known for their extreme resilience to starvation. The molecular mechanisms behind this ability, however, are little known. Thus, the whole transcriptomes of blood-fed and starved bed bugs from the species Cimex hemipterus (tropical bed bugs) were sequenced and compared. The transcriptome of tropical bed bugs was initially annotated. Following differentially expressed genes (DEGs) analysis, regulated transcripts were mostly identified in biological processes during blood-feeding and starvation. The results provide an overview of the functional genes proportion of this species and a deeper understanding of the bed bug’s molecular mechanism of resistance to blood feeding and starvation. Abstract The reference transcriptome for Cimex hemipterus (tropical bed bug) was assembled de novo in this study, and differential expression analysis was conducted between blood-fed and starved tropical bed bug. A total of 24,609 transcripts were assembled, with around 79% of them being annotated against the Eukaryotic Orthologous Groups (KOG) database. The transcriptomic comparison revealed several differentially expressed genes between blood-fed and starved bed bugs, with 38 of them being identifiable. There were 20 and 18 genes significantly upregulated in blood-fed and starved bed bugs, respectively. Differentially expressed genes (DEGs) were revealed to be associated with regulation, metabolism, transport, motility, immune, and stress response; endocytosis; and signal transduction. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis showed an enrichment of genes encoding steroid biosynthesis, glycosaminoglycan biosynthesis, butanoate metabolism, and autophagy in both blood-fed and starved bed bugs. However, in blood-fed bed bugs, genes involved in histidine metabolism, caffeine metabolism, ubiquinone/terpenoid-quinone biosynthesis, and sulfur relay system were enriched. On the other hand, starvation activates genes related to nicotinate and nicotinamide metabolism, fatty acid elongation, terpenoid backbone biosynthesis, metabolism of xenobiotics by cytochrome P450, riboflavin metabolism, apoptosis, and protein export. The present study is the first to report a de novo transcriptomic analysis in C. hemipterus and demonstrated differential responses of bed bugs in facing blood-feeding and starvation.
Collapse
|
26
|
Webster C, Figueroa‐Corona L, Méndez‐González ID, Álvarez‐Soto L, Neale DB, Jaramillo‐Correa JP, Wegrzyn JL, Vázquez‐Lobo A. Comparative analysis of differential gene expression indicates divergence in ontogenetic strategies of leaves in two conifer genera. Ecol Evol 2022; 12:e8611. [PMID: 35222971 PMCID: PMC8848466 DOI: 10.1002/ece3.8611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 12/21/2021] [Accepted: 01/23/2022] [Indexed: 11/09/2022] Open
Abstract
In land plants, heteroblasty broadly refers to a drastic change in morphology during growth through ontogeny. Juniperus flaccida and Pinus cembroides are conifers of independent lineages known to exhibit leaf heteroblasty between the juvenile and adult life stage of development. Juvenile leaves of P. cembroides develop spirally on the main stem and appear decurrent, flattened, and needle‐like; whereas adult photosynthetic leaves are triangular or semi‐circular needle‐like, and grow in whorls on secondary or tertiary compact dwarf shoots. By comparison, J. flaccida juvenile leaves are decurrent and needle‐like, and adult leaves are compact, short, and scale‐like. Comparative analyses were performed to evaluate differences in anatomy and gene expression patterns between developmental phases in both species. RNA from 12 samples was sequenced and analyzed with available software. They were assembled de novo from the RNA‐Seq reads. Following assembly, 63,741 high‐quality transcripts were functionally annotated in P. cembroides and 69,448 in J. flaccida. Evaluation of the orthologous groups yielded 4140 shared gene families among the four references (adult and juvenile from each species). Activities related to cell division and development were more abundant in juveniles than adults in P. cembroides, and more abundant in adults than juveniles in J. flaccida. Overall, there were 509 up‐regulated and 81 down‐regulated genes in the juvenile condition of P. cembroides and 14 up‐regulated and 22 down‐regulated genes in J. flaccida. Gene interaction network analysis showed evidence of co‐expression and co‐localization of up‐regulated genes involved in cell wall and cuticle formation, development, and phenylpropanoid pathway, in juvenile P. cembroides leaves. Whereas in J. flaccida, differential expression and gene interaction patterns were detected in genes involved in photosynthesis and chloroplast biogenesis. Although J. flaccida and P. cembroides both exhibit leaf heteroblastic development, little overlap was detected, and unique genes and pathways were highlighted in this study.
Collapse
Affiliation(s)
- Cynthia Webster
- Department of Ecology and Evolutionary Biology University of Connecticut Storrs Connecticut USA
| | - Laura Figueroa‐Corona
- Departamento de Ecología Evolutiva Instituto de Ecología Universidad Nacional Autónoma de México Ciudad de México Mexico
| | - Iván David Méndez‐González
- Departamento de Ecología Evolutiva Instituto de Ecología Universidad Nacional Autónoma de México Ciudad de México Mexico
- Department of Biological Sciences University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Lluvia Álvarez‐Soto
- Facultad de Ciencias Biológicas Universidad Autónoma del Estado de Morelos Cuernavaca México
| | - David B. Neale
- Department of Plant Sciences University of California Davis California USA
| | - Juan Pablo Jaramillo‐Correa
- Departamento de Ecología Evolutiva Instituto de Ecología Universidad Nacional Autónoma de México Ciudad de México Mexico
| | - Jill L. Wegrzyn
- Department of Ecology and Evolutionary Biology University of Connecticut Storrs Connecticut USA
| | - Alejandra Vázquez‐Lobo
- Centro de Investigación en Biodiversidad y Conservación Universidad Autónoma del Estado de Morelos Cuernavaca México
| |
Collapse
|
27
|
Rivera-Vicéns RE, Garcia-Escudero CA, Conci N, Eitel M, Wörheide G. TransPi - a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly. Mol Ecol Resour 2022; 22:2070-2086. [PMID: 35119207 DOI: 10.1111/1755-0998.13593] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 01/10/2022] [Accepted: 01/24/2022] [Indexed: 11/30/2022]
Abstract
The use of RNA-Seq data and the generation of de novo transcriptome assemblies have been pivotal for studies in ecology and evolution. This is distinctly true for non-model organisms, where no genome information is available. In such organisms, studies of differential gene expression, DNA enrichment baits design, and phylogenetics can all be accomplished with de novo transcriptome assemblies. Multiple tools are available for transcriptome assembly, however, no single tool can provide the best assembly for all datasets. Therefore, a multi assembler approach, followed by a reduction step, is often sought to generate an improved representation of the assembly. To reduce errors in these complex analyses while at the same time attaining reproducibility and scalability, automated workflows have been essential in the analysis of RNA-Seq data. However, most of these tools are designed for species where genome data is used as reference for the assembly process, limiting their use in non-model organisms. We present TransPi, a comprehensive pipeline for de novo transcriptome assembly, with minimum user input but without losing the ability of a thorough analysis. A combination of different model organisms, k-mer sets, read lengths, and read quantities were used for assessing the tool. Furthermore, a total of 49 non-model organisms, spanning different phyla, were also analysed. Compared to approaches using single assemblers only, TransPi produces higher BUSCO completeness percentages, and a concurrent significant reduction in duplication rates. TransPi is easy to configure and can be deployed seamlessly using Conda, Docker and Singularity.
Collapse
Affiliation(s)
- R E Rivera-Vicéns
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany
| | - C A Garcia-Escudero
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany.,Graduate School for Evolution, Ecology and Systematics, Faculty of Biology, Ludwig-Maximilians-Universität München, Biozentrum Großhaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - N Conci
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany
| | - M Eitel
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany
| | - G Wörheide
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany.,GeoBio-Center, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333, München, Germany.,SNSB-Bayerische Staatssammlung für Paläontologie und Geologie, Richard-Wagner-Str. 10, 80333, München, Germany
| |
Collapse
|
28
|
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Collapse
Affiliation(s)
- Venket Raghavan
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | - Louis Kraft
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | | | | |
Collapse
|
29
|
Bzikadze AV, Mikheenko A, Pevzner PA. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res 2022; 32:2107-2118. [PMID: 36379716 PMCID: PMC9808623 DOI: 10.1101/gr.276871.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.
Collapse
Affiliation(s)
- Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, California 92093, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, 199034, Russia
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|
30
|
Shmakov NА. Improving the quality of barley transcriptome de novo assembling by using a hybrid approach for lines with varying spike and stem coloration. Vavilovskii Zhurnal Genet Selektsii 2021; 25:30-38. [PMID: 34901701 PMCID: PMC8627909 DOI: 10.18699/vj21.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/15/2021] [Accepted: 01/15/2021] [Indexed: 11/19/2022] Open
Abstract
De novo transcriptome assembly is an important stage of RNA-seq data computational analysis. It allows the researchers to obtain the sequences of transcripts presented in the biological sample of interest. The availability of accurate and complete transcriptome sequence of the organism of interest is, in turn, an indispensable condition for further analysis of RNA-seq data. Through years of transcriptomic research, the bioinformatics community has developed a number of assembler programs for transcriptome reconstruction from short reads of RNA-seq libraries. Different assemblers makes it possible to conduct a de novo transcriptome reconstruction and a genome-guided reconstruction. The majority of the assemblers working with RNA-seq data are based on the De Bruijn graph method of sequence reconstruction. However, specif ics of their procedures can vary drastically, as do their results. A number of authors recommend a hybrid approach to transcriptome reconstruction based on combining the results of several assemblers in order to achieve a better transcriptome assembly. The advantage of this approach has been demonstrated in a number of studies, with RNA-seq experiments conducted on the Illumina platform. In this paper, we propose a hybrid approach for creating a transcriptome assembly of the barley Hordeum vulgare isogenic line Bowman and two nearly isogenic lines contrasting in spike pigmentation, based on the results of sequencing on the IonTorrent platform. This approach implements several de novo assemblers: Trinity, Trans-ABySS and rnaSPAdes. Several assembly metrics were examined: the percentage of reference transcripts observed in the assemblies, the percentage of RNA-seq reads involved, and BUSCO scores. It was shown that, based on the summation of these metrics, transcriptome meta-assembly surpasses individual transcriptome assemblies it consists of.
Collapse
Affiliation(s)
- N А Shmakov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomics Center, Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
31
|
Resolving the microalgal gene landscape at the strain level: A novel hybrid transcriptome of Emiliania huxleyi CCMP3266. Appl Environ Microbiol 2021; 88:e0141821. [PMID: 34757817 DOI: 10.1128/aem.01418-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microalgae are key ecological players with a complex evolutionary history. Genomic diversity, in addition to limited availability of high-quality genomes, challenge studies that aim to elucidate molecular mechanisms underlying microalgal ecophysiology. Here, we present a novel and comprehensive transcriptomic hybrid approach to generate a reference for genetic analyses, and resolve the microalgal gene landscape at the strain level. The approach is demonstrated for a strain of the coccolithophore microalga Emiliania huxleyi, which is a species complex with considerable genome variability. The investigated strain is commonly studied as a model for algal-bacterial interactions, and was therefore sequenced in the presence of bacteria to elicit the expression of interaction-relevant genes. We applied complementary PacBio Iso-Seq full-length cDNA, and poly(A)-independent Illumina total RNA sequencing, which resulted in a de novo assembled, near complete hybrid transcriptome. In particular, hybrid sequencing improved the reconstruction of long transcripts and increased the recovery of full-length transcript isoforms. To use the resulting hybrid transcriptome as a reference for genetic analyses, we demonstrate a method that collapses the transcriptome into a genome-like dataset, termed "synthetic genome" (sGenome). We used the sGenome as a reference to visually confirm the robustness of the CCMP3266 gene assembly, to conduct differential gene expression analysis, and to characterize novel E. huxleyi genes. The newly-identified genes contribute to our understanding of E. huxleyi genome diversification, and are predicted to play a role in microbial interactions. Our transcriptomic toolkit can be implemented in various microalgae to facilitate mechanistic studies on microalgal diversity and ecology. Importance Microalgae are key players in the ecology and biogeochemistry of our oceans. Efforts to implement genomic and transcriptomic tools in laboratory studies involving microalgae suffer from the lack of published genomes. In the case of coccolithophore microalgae, the problem has long been recognized; the model species Emiliania huxleyi is a species complex with genomes composed of a core, and a large variable portion. To study the role of the variable portion in niche adaptation, and specifically in microbial interactions, strain-specific genetic information is required. Here we present a novel transcriptomic hybrid approach, and generated strain-specific genome-like information. We demonstrate our approach on an E. huxleyi strain that is co-cultivated with bacteria. By constructing a "synthetic genome", we generated comprehensive gene annotations that enabled accurate analyses of gene expression patterns. Importantly, we unveiled novel genes in the variable portion of E. huxleyi that play putative roles in microbial interactions.
Collapse
|
32
|
Taheri-Dehkordi A, Naderi R, Martinelli F, Salami SA. Computational screening of miRNAs and their targets in saffron (Crocus sativus L.) by transcriptome mining. PLANTA 2021; 254:117. [PMID: 34751821 DOI: 10.1007/s00425-021-03761-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
A robust workflow for the identification of miRNAs and their targets in saffron was developed. MicroRNA-mediated gene regulation in saffron is potentially involved in several biological processes, including the biosynthesis of highly valuable apocarotenoids. Saffron (Crocus sativus L.) is the most expensive spice in the world and a major source of apocarotenoids. Even though miRNAs (20-24 nt non-coding small RNAs) are important regulators of gene expression at transcriptional and post-transcriptional levels, their role in saffron has not been thoroughly investigated. As a result, a workflow for computational identification of miRNAs and their targets can be useful to uncover the regulatory networks underlying biological processes in this valuable plant. The efficiency of several assembly tools such as Trans-ABySS, Trinity, Bridger, rnaSPAdes, and EvidentialGene was evaluated based on both reference-based and reference-free metrics using transcriptome data. A reliable workflow for computational identification of miRNAs and their targets in saffron was described. The EvidentialGene was found to be the most efficient de novo transcriptome assembler for saffron as a complex triploid model, followed by the Trinity. In total, 66 miRNAs from 19 different families that target 2880 genes, including several transcription factors involved in the flowering transition, were identified. Three of the identified targets were involved in the terpenoids backbone biosynthesis. CsCCD and CsUGT genes involved in the apocarotenoids biosynthetic pathway were targeted by csa-miR156g and csa-miR156b-3p, revealing a unique post-transcriptional regulation dynamic in saffron. The identified miRNAs and their targets add to our understanding of the many biological roles of miRNAs in saffron and shed new light on the control of the apocarotenoid biosynthetic pathway in this valuable plant.
Collapse
Affiliation(s)
- Ayat Taheri-Dehkordi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran
| | - Roohangiz Naderi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran
| | | | - Seyed Alireza Salami
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran.
| |
Collapse
|
33
|
De novo Assembly, Annotation, and Analysis of Transcriptome Data of the Ladakh Ground Skink Provide Genetic Information on High-Altitude Adaptation. Genes (Basel) 2021; 12:genes12091423. [PMID: 34573405 PMCID: PMC8466045 DOI: 10.3390/genes12091423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/13/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022] Open
Abstract
The Himalayan Arc is recognized as a global biodiversity hotspot. Among its numerous cryptic and undiscovered organisms, this composite high-mountain ecosystem harbors many taxa with adaptations to life in high elevations. However, evolutionary patterns and genomic features have been relatively rarely studied in Himalayan vertebrates. Here, we provide the first well-annotated transcriptome of a Greater Himalayan reptile species, the Ladakh Ground skink Asymblepharus ladacensis (Squamata: Scincidae). Based on tissues from the brain, an embryonic disc, and pooled organ material, using pair-end Illumina NextSeq 500 RNAseq, we assembled ~77,000 transcripts, which were annotated using seven functional databases. We tested ~1600 genes, known to be under positive selection in anurans and reptiles adapted to high elevations, and potentially detected positive selection for 114 of these genes in Asymblepharus. Even though the strength of these results is limited due to the single-animal approach, our transcriptome resource may be valuable data for further studies on squamate reptile evolution in the Himalayas as a hotspot of biodiversity.
Collapse
|
34
|
Perez R, de Souza Araujo N, Defrance M, Aron S. Molecular adaptations to heat stress in the thermophilic ant genus Cataglyphis. Mol Ecol 2021; 30:5503-5516. [PMID: 34415643 DOI: 10.1111/mec.16134] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 12/13/2022]
Abstract
Over the last decade, increasing attention has been paid to the molecular adaptations used by organisms to cope with thermal stress. However, to date, few studies have focused on thermophilic species living in hot, arid climates. In this study, we explored molecular adaptations to heat stress in the thermophilic ant genus Cataglyphis, one of the world's most thermotolerant animal taxa. We compared heat tolerance and gene expression patterns across six Cataglyphis species from distinct phylogenetic groups that live in different habitats and experience different thermal regimes. We found that all six species had high heat tolerance levels with critical thermal maxima (CTmax ) ranging from 43℃ to 45℃ and a median lethal temperature (LT50) ranging from 44.5℃ to 46.8℃. Transcriptome analyses revealed that, although the number of differentially expressed genes varied widely for the six species (from 54 to 1118), many were also shared. Functional annotation of the differentially expressed and co-expressed genes showed that the biological pathways involved in heat-shock responses were similar among species and were associated with four major processes: the regulation of transcriptional machinery and DNA metabolism; the preservation of proteome stability; the elimination of toxic residues; and the maintenance of cellular integrity. Overall, our results suggest that molecular responses to heat stress have been evolutionarily conserved in the ant genus Cataglyphis and that their diversity may help workers withstand temperatures close to their physiological limits.
Collapse
Affiliation(s)
- Rémy Perez
- Department of Evolutionary Biology & Ecology, Université Libre de Bruxelles, Brussels, Belgium
| | - Natalia de Souza Araujo
- Department of Evolutionary Biology & Ecology, Université Libre de Bruxelles, Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles, Brussels, Belgium
| | - Matthieu Defrance
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles, Brussels, Belgium
| | - Serge Aron
- Department of Evolutionary Biology & Ecology, Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
35
|
Metatranscriptomic Analysis of Bacterial Communities on Laundered Textiles: A Pilot Case Study. Microorganisms 2021; 9:microorganisms9081591. [PMID: 34442670 PMCID: PMC8400938 DOI: 10.3390/microorganisms9081591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
Microbially contaminated washing machines and mild laundering conditions facilitate the survival and growth of microorganisms on laundry, promoting undesired side effects such as malodor formation. Clearly, a deeper understanding of the functionality and hygienic relevance of the laundry microbiota necessitates the analysis of the microbial gene expression on textiles after washing, which—to the best of our knowledge—has not been performed before. In this pilot case study, we used single-end RNA sequencing to generate de novo transcriptomes of the bacterial communities remaining on polyester and cotton fabrics washed in a domestic washing machine in mild conditions and subsequently incubated under moist conditions for 72 h. Two common de novo transcriptome assemblers were used. The final assemblies included 22,321 Trinity isoforms and 12,600 Spades isoforms. A large part of these isoforms could be assigned to the SwissProt database, and was further categorized into “molecular function”, “biological process” and “cellular component” using Gene Ontology (GO) terms. In addition, differential gene expression was used to show the difference in the pairwise comparison of the two tissue types. When comparing the assemblies generated with the two assemblers, the annotation results were relatively similar. However, there were clear differences between the de novo assemblies regarding differential gene expression.
Collapse
|
36
|
Solovyeva A, Levakin I, Zorin E, Adonin L, Khotimchenko Y, Podgornaya O. Transposons-Based Clonal Diversity in Trematode Involves Parts of CR1 (LINE) in Eu- and Heterochromatin. Genes (Basel) 2021; 12:genes12081129. [PMID: 34440303 PMCID: PMC8392823 DOI: 10.3390/genes12081129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/22/2021] [Accepted: 07/23/2021] [Indexed: 01/21/2023] Open
Abstract
Trematode parthenitae have long been believed to form clonal populations, but clonal diversity has been discovered in this asexual stage of the lifecycle. Clonal polymorphism in the model species Himasthla elongata has been previously described, but the source of this phenomenon remains unknown. In this work, we traced cercarial clonal diversity using a simplified amplified fragment length polymorphism (SAFLP) method and characterised the nature of fragments in diverse electrophoretic bands. The repetitive elements were identified in both the primary sequence of the H. elongata genome and in the transcriptome data. Long-interspersed nuclear elements (LINEs) and long terminal repeat retrotransposons (LTRs) were found to represent an overwhelming majority of the genome and the transposon transcripts. Most sequenced fragments from SAFLP pattern contained the reverse transcriptase (RT, ORF2) domains of LINEs, and only a few sequences belonged to ORFs of LTRs and ORF1 of LINEs. A fragment corresponding to a CR1-like (LINE) spacer region was discovered and named CR1-renegade (CR1-rng). In addition to RT-containing CR1 transcripts, we found short CR1-rng transcripts in the redia transcriptome and short contigs in the mobilome. Probes against CR1-RT and CR1-rng presented strikingly different pictures in FISH mapping, despite both being fragments of CR1. In silico data and Southern blotting indicated that CR1-rng is not tandemly organised. CR1 involvement in clonal diversity is discussed.
Collapse
Affiliation(s)
- Anna Solovyeva
- Institute of Cytology of the Russian Academy of Science, Tikhoretsky Ave 4, 194064 Saint Petersburg, Russia;
- Zoological Institute of the Russian Academy of Sciences, Universitetskaya Nab 1, 199034 Saint Petersburg, Russia;
- Correspondence:
| | - Ivan Levakin
- Zoological Institute of the Russian Academy of Sciences, Universitetskaya Nab 1, 199034 Saint Petersburg, Russia;
| | - Evgeny Zorin
- All-Russia Research Institute for Agricultural Microbiology, Pushkin 8, 196608 Saint Petersburg, Russia;
| | - Leonid Adonin
- Moscow Institute of Physics and Technology, Institutskiy per 9, 141701 Dolgoprudny, Russia;
| | - Yuri Khotimchenko
- School of Biomedicine, Far Eastern Federal University, Sukhanova St 8, 690091 Vladivostok, Russia;
| | - Olga Podgornaya
- Institute of Cytology of the Russian Academy of Science, Tikhoretsky Ave 4, 194064 Saint Petersburg, Russia;
- Department of Cytology and Histology, Saint Petersburg State University, Universitetskaya Nab 7/9, 199034 Saint Petersburg, Russia
| |
Collapse
|
37
|
Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. ACTA ACUST UNITED AC 2021; 70:e102. [PMID: 32559359 DOI: 10.1002/cpbi.102] [Citation(s) in RCA: 899] [Impact Index Per Article: 299.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.
Collapse
Affiliation(s)
- Andrey Prjibelski
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Alla Lapidus
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Cytology and Histology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| |
Collapse
|
38
|
Ali A, Thorgaard GH, Salem M. PacBio Iso-Seq Improves the Rainbow Trout Genome Annotation and Identifies Alternative Splicing Associated With Economically Important Phenotypes. Front Genet 2021; 12:683408. [PMID: 34335690 PMCID: PMC8321248 DOI: 10.3389/fgene.2021.683408] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 06/14/2021] [Indexed: 01/04/2023] Open
Abstract
Rainbow trout is an important model organism that has received concerted international efforts to study the transcriptome. For this purpose, short-read sequencing has been primarily used over the past decade. However, these sequences are too short of resolving the transcriptome complexity. This study reported a first full-length transcriptome assembly of the rainbow trout using single-molecule long-read isoform sequencing (Iso-Seq). Extensive computational approaches were used to refine and validate the reconstructed transcriptome. The study identified 10,640 high-confidence transcripts not previously annotated, in addition to 1,479 isoforms not mapped to the current Swanson reference genome. Most of the identified lncRNAs were non-coding variants of coding transcripts. The majority of genes had multiple transcript isoforms (average ∼3 isoforms/locus). Intron retention (IR) and exon skipping (ES) accounted for 56% of alternative splicing (AS) events. Iso-Seq improved the reference genome annotation, which allowed identification of characteristic AS associated with fish growth, muscle accretion, disease resistance, stress response, and fish migration. For instance, an ES in GVIN1 gene existed in fish susceptible to bacterial cold-water disease (BCWD). Besides, under five stress conditions, there was a commonly regulated exon in prolyl 4-hydroxylase subunit alpha-2 (P4HA2) gene. The reconstructed gene models and their posttranscriptional processing in rainbow trout provide invaluable resources that could be further used for future genetics and genomics studies. Additionally, the study identified characteristic transcription events associated with economically important phenotypes, which could be applied in selective breeding.
Collapse
Affiliation(s)
- Ali Ali
- Department of Animal and Avian Sciences, University of Maryland, College Park, College Park, MD, United States
| | - Gary H. Thorgaard
- School of Biological Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Mohamed Salem
- Department of Animal and Avian Sciences, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
39
|
Mora-Márquez F, Vázquez-Poletti JL, López de Heredia U. NGScloud2: optimized bioinformatic analysis using Amazon Web Services. PeerJ 2021; 9:e11237. [PMID: 33959420 PMCID: PMC8054753 DOI: 10.7717/peerj.11237] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 03/17/2021] [Indexed: 12/13/2022] Open
Abstract
Background NGScloud was a bioinformatic system developed to perform de novo RNAseq analysis of non-model species by exploiting the cloud computing capabilities of Amazon Web Services. The rapid changes undergone in the way this cloud computing service operates, along with the continuous release of novel bioinformatic applications to analyze next generation sequencing data, have made the software obsolete. NGScloud2 is an enhanced and expanded version of NGScloud that permits the access to ad hoc cloud computing infrastructure, scaled according to the complexity of each experiment. Methods NGScloud2 presents major technical improvements, such as the possibility of running spot instances and the most updated AWS instances types, that can lead to significant cost savings. As compared to its initial implementation, this improved version updates and includes common applications for de novo RNAseq analysis, and incorporates tools to operate workflows of bioinformatic analysis of reference-based RNAseq, RADseq and functional annotation. NGScloud2 optimizes the access to Amazon’s large computing infrastructures to easily run popular bioinformatic software applications, otherwise inaccessible to non-specialized users lacking suitable hardware infrastructures. Results The correct performance of the pipelines for de novo RNAseq, reference-based RNAseq, RADseq and functional annotation was tested with real experimental data, providing workflow performance estimates and tips to make optimal use of NGScloud2. Further, we provide a qualitative comparison of NGScloud2 vs. the Galaxy framework. NGScloud2 code, instructions for software installation and use are available at https://github.com/GGFHF/NGScloud2. NGScloud2 includes a companion package, NGShelper that contains Python utilities to post-process the output of the pipelines for downstream analysis at https://github.com/GGFHF/NGShelper.
Collapse
Affiliation(s)
- Fernando Mora-Márquez
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Madrid, Spain
| | - José Luis Vázquez-Poletti
- GI Arquitectura de Sistemas Distribuidos, Dpto. de Arquitectura de Ordenadores y Automática, Facultad de Informática, Universidad Complutense de Madrid, Madrid, Spain
| | - Unai López de Heredia
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
40
|
Bliznina A, Masunaga A, Mansfield MJ, Tan Y, Liu AW, West C, Rustagi T, Chien HC, Kumar S, Pichon J, Plessy C, Luscombe NM. Telomere-to-telomere assembly of the genome of an individual Oikopleura dioica from Okinawa using Nanopore-based sequencing. BMC Genomics 2021; 22:222. [PMID: 33781200 PMCID: PMC8008620 DOI: 10.1186/s12864-021-07512-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/05/2021] [Indexed: 11/10/2022] Open
Abstract
Background The larvacean Oikopleura dioica is an abundant tunicate plankton with the smallest (65–70 Mbp) non-parasitic, non-extremophile animal genome identified to date. Currently, there are two genomes available for the Bergen (OdB3) and Osaka (OSKA2016) O. dioica laboratory strains. Both assemblies have full genome coverage and high sequence accuracy. However, a chromosome-scale assembly has not yet been achieved. Results Here, we present a chromosome-scale genome assembly (OKI2018_I69) of the Okinawan O. dioica produced using long-read Nanopore and short-read Illumina sequencing data from a single male, combined with Hi-C chromosomal conformation capture data for scaffolding. The OKI2018_I69 assembly has a total length of 64.3 Mbp distributed among 19 scaffolds. 99% of the assembly is contained within five megabase-scale scaffolds. We found telomeres on both ends of the two largest scaffolds, which represent assemblies of two fully contiguous autosomal chromosomes. Each of the other three large scaffolds have telomeres at one end only and we propose that they correspond to sex chromosomes split into a pseudo-autosomal region and X-specific or Y-specific regions. Indeed, these five scaffolds mostly correspond to equivalent linkage groups in OdB3, suggesting overall agreement in chromosomal organization between the two populations. At a more detailed level, the OKI2018_I69 assembly possesses similar genomic features in gene content and repetitive elements reported for OdB3. The Hi-C map suggests few reciprocal interactions between chromosome arms. At the sequence level, multiple genomic features such as GC content and repetitive elements are distributed differently along the short and long arms of the same chromosome. Conclusions We show that a hybrid approach of integrating multiple sequencing technologies with chromosome conformation information results in an accurate de novo chromosome-scale assembly of O. dioica’s highly polymorphic genome. This genome assembly opens up the possibility of cross-genome comparison between O. dioica populations, as well as of studies of chromosomal evolution in this lineage. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07512-6.
Collapse
Affiliation(s)
- Aleksandra Bliznina
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Aki Masunaga
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Michael J Mansfield
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Yongkai Tan
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Andrew W Liu
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Charlotte West
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.,Francis Crick Institute, London, UK
| | - Tanmay Rustagi
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Hsiao-Chiao Chien
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Saurabh Kumar
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Julien Pichon
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Charles Plessy
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Nicholas M Luscombe
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.,Francis Crick Institute, London, UK.,Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
41
|
Behera S, Voshall A, Moriyama EN. Plant Transcriptome Assembly: Review and Benchmarking. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
42
|
Galise TR, Esposito S, D'Agostino N. Guidelines for Setting Up a mRNA Sequencing Experiment and Best Practices for Bioinformatic Data Analysis. Methods Mol Biol 2021; 2264:137-162. [PMID: 33263908 DOI: 10.1007/978-1-0716-1201-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA-sequencing, commonly referred to as RNA-seq, is the most recently developed method for the analysis of transcriptomes. It uses high-throughput next-generation sequencing technologies and has revolutionized our understanding of the complexity and dynamics of whole transcriptomes.In this chapter, we recall the key developments in transcriptome analysis and dissect the different steps of the general workflow that can be run by users to design and perform a mRNA-seq experiment as well as to process mRNA-seq data obtained by the Illumina technology. The chapter proposes guidelines for completing a mRNA-seq study properly and makes available recommendations for best practices based on recent literature and on the latest developments in technology and algorithms. We also remark the large number of choices available (especially for bioinformatic data analysis) in front of which the scientist may be in trouble.In the last part of the chapter we discuss the new frontiers of single-cell RNA-seq and isoform sequencing by long read technology.
Collapse
Affiliation(s)
- Teresa Rosa Galise
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy
| | - Salvatore Esposito
- CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy
| | - Nunzio D'Agostino
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy.
| |
Collapse
|
43
|
Sharma P, Sharma BS, Verma RJ. A Guide to RNAseq Data Analysis Using Bioinformatics Approaches. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
44
|
Mora-Márquez F, Vázquez-Poletti JL, Chano V, Collada C, Soto Á, de Heredia UL. Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud. Curr Bioinform 2020. [DOI: 10.2174/1574893615666191219095817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics software for RNA-seq analysis has a high computational
requirement in terms of the number of CPUs, RAM size, and processor characteristics.
Specifically, de novo transcriptome assembly demands large computational infrastructure due to
the massive data size, and complexity of the algorithms employed. Comparative studies on the
quality of the transcriptome yielded by de novo assemblers have been previously published,
lacking, however, a hardware efficiency-oriented approach to help select the assembly hardware
platform in a cost-efficient way.
Objective:
We tested the performance of two popular de novo transcriptome assemblers, Trinity
and SOAPdenovo-Trans (SDNT), in terms of cost-efficiency and quality to assess limitations, and
provided troubleshooting and guidelines to run transcriptome assemblies efficiently.
Methods:
We built virtual machines with different hardware characteristics (CPU number, RAM
size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and
real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and
large data set assemblies.
Results:
For small data sets, SDNT outperformed Trinity by an order the magnitude, significantly
reducing the time duration and costs of the assembly. For large data sets, Trinity performed better
than SDNT. Both the assemblers provide good quality transcriptomes.
Conclusion:
The selection of the optimal transcriptome assembler and provision of computational
resources depend on the combined effect of size and complexity of RNA-seq experiments.
Collapse
Affiliation(s)
- Fernando Mora-Márquez
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - José Luis Vázquez-Poletti
- GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automatica, Facultad de Informatica, Universidad Complutense de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Víctor Chano
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Carmen Collada
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Álvaro Soto
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Unai López de Heredia
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| |
Collapse
|
45
|
Alejo-Jacuinde G, González-Morales SI, Oropeza-Aburto A, Simpson J, Herrera-Estrella L. Comparative transcriptome analysis suggests convergent evolution of desiccation tolerance in Selaginella species. BMC PLANT BIOLOGY 2020; 20:468. [PMID: 33046015 PMCID: PMC7549206 DOI: 10.1186/s12870-020-02638-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 09/04/2020] [Indexed: 05/04/2023]
Abstract
BACKGROUND Desiccation tolerant Selaginella species evolved to survive extreme environmental conditions. Studies to determine the mechanisms involved in the acquisition of desiccation tolerance (DT) have focused on only a few Selaginella species. Due to the large diversity in morphology and the wide range of responses to desiccation within the genus, the understanding of the molecular basis of DT in Selaginella species is still limited. RESULTS Here we present a reference transcriptome for the desiccation tolerant species S. sellowii and the desiccation sensitive species S. denticulata. The analysis also included transcriptome data for the well-studied S. lepidophylla (desiccation tolerant), in order to identify DT mechanisms that are independent of morphological adaptations. We used a comparative approach to discriminate between DT responses and the common water loss response in Selaginella species. Predicted proteomes show strong homology, but most of the desiccation responsive genes differ between species. Despite such differences, functional analysis revealed that tolerant species with different morphologies employ similar mechanisms to survive desiccation. Significant functions involved in DT and shared by both tolerant species included induction of antioxidant systems, amino acid and secondary metabolism, whereas species-specific responses included cell wall modification and carbohydrate metabolism. CONCLUSIONS Reference transcriptomes generated in this work represent a valuable resource to study Selaginella biology and plant evolution in relation to DT. Our results provide evidence of convergent evolution of S. sellowii and S. lepidophylla due to the different gene sets that underwent selection to acquire DT.
Collapse
Affiliation(s)
- Gerardo Alejo-Jacuinde
- National Laboratory of Genomics for Biodiversity (Langebio), Unit of Advanced Genomics, CINVESTAV, 36824 Irapuato, Guanajuato Mexico
- Department of Genetic Engineering, CINVESTAV, 36824 Irapuato, Guanajuato Mexico
| | | | - Araceli Oropeza-Aburto
- National Laboratory of Genomics for Biodiversity (Langebio), Unit of Advanced Genomics, CINVESTAV, 36824 Irapuato, Guanajuato Mexico
| | - June Simpson
- Department of Genetic Engineering, CINVESTAV, 36824 Irapuato, Guanajuato Mexico
| | - Luis Herrera-Estrella
- National Laboratory of Genomics for Biodiversity (Langebio), Unit of Advanced Genomics, CINVESTAV, 36824 Irapuato, Guanajuato Mexico
- Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, TX 79409 USA
| |
Collapse
|
46
|
Puglia GD, Prjibelski AD, Vitale D, Bushmanova E, Schmid KJ, Raccuia SA. Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.). BMC Genomics 2020; 21:317. [PMID: 32819282 PMCID: PMC7441626 DOI: 10.1186/s12864-020-6670-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/13/2020] [Indexed: 12/11/2022] Open
Abstract
Background The investigation of transcriptome profiles using short reads in non-model organisms, which lack of well-annotated genomes, is limited by partial gene reconstruction and isoform detection. In contrast, long-reads sequencing techniques revealed their potential to generate complete transcript assemblies even when a reference genome is lacking. Cynara cardunculus var. altilis (DC) (cultivated cardoon) is a perennial hardy crop adapted to dry environments with many industrial and nutraceutical applications due to the richness of secondary metabolites mostly produced in flower heads. The investigation of this species benefited from the recent release of a draft genome, but the transcriptome profile during the capitula formation still remains unexplored. In the present study we show a transcriptome analysis of vegetative and inflorescence organs of cultivated cardoon through a novel hybrid RNA-seq assembly approach utilizing both long and short RNA-seq reads. Results The inclusion of a single Nanopore flow-cell output in a hybrid sequencing approach determined an increase of 15% complete assembled genes and 18% transcript isoforms respect to short reads alone. Among 25,463 assembled unigenes, we identified 578 new genes and updated 13,039 gene models, 11,169 of which were alternatively spliced isoforms. During capitulum development, 3424 genes were differentially expressed and approximately two-thirds were identified as transcription factors including bHLH, MYB, NAC, C2H2 and MADS-box which were highly expressed especially after capitulum opening. We also show the expression dynamics of key genes involved in the production of valuable secondary metabolites of which capitulum is rich such as phenylpropanoids, flavonoids and sesquiterpene lactones. Most of their biosynthetic genes were strongly transcribed in the flower heads with alternative isoforms exhibiting differentially expression levels across the tissues. Conclusions This novel hybrid sequencing approach allowed to improve the transcriptome assembly, to update more than half of annotated genes and to identify many novel genes and different alternatively spliced isoforms. This study provides new insights on the flowering cycle in an Asteraceae plant, a valuable resource for plant biology and breeding in Cynara and an effective method for improving gene annotation.
Collapse
Affiliation(s)
- Giuseppe D Puglia
- Institute for Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599, Stuttgart, Germany. .,Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy.
| | - Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Domenico Vitale
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy
| | - Elena Bushmanova
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Karl J Schmid
- Institute for Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599, Stuttgart, Germany.
| | - Salvatore A Raccuia
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy
| |
Collapse
|
47
|
Nip KM, Chiu R, Yang C, Chu J, Mohamadi H, Warren RL, Birol I. RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes. Genome Res 2020; 30:1191-1200. [PMID: 32817073 PMCID: PMC7462077 DOI: 10.1101/gr.260174.119] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/23/2020] [Indexed: 12/27/2022]
Abstract
Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within the last decade, single-cell transcriptome analysis workflows have primarily used gene expression data while isoform sequence analysis at the single-cell level still remains fairly limited. Detection and discovery of isoforms in single cells is difficult because of the inherent technical shortcomings of scRNA-seq data, and existing transcriptome assembly methods are mainly designed for bulk RNA samples. To address this challenge, we developed RNA-Bloom, an assembly algorithm that leverages the rich information content aggregated from multiple single-cell transcriptomes to reconstruct cell-specific isoforms. Assembly with RNA-Bloom can be either reference-guided or reference-free, thus enabling unbiased discovery of novel isoforms or foreign transcripts. We compared both assembly strategies of RNA-Bloom against five state-of-the-art reference-free and reference-based transcriptome assembly methods. In our benchmarks on a simulated 384-cell data set, reference-free RNA-Bloom reconstructed 37.9%–38.3% more isoforms than the best reference-free assembler, whereas reference-guided RNA-Bloom reconstructed 4.1%–11.6% more isoforms than reference-based assemblers. When applied to a real 3840-cell data set consisting of more than 4 billion reads, RNA-Bloom reconstructed 9.7%–25.0% more isoforms than the best competing reference-based and reference-free approaches evaluated. We expect RNA-Bloom to boost the utility of scRNA-seq data beyond gene expression analysis, expanding what is informatically accessible now.
Collapse
Affiliation(s)
- Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - Justin Chu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada V5Z 4S6.,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada V6H 3N1
| |
Collapse
|
48
|
Miller CH, Campbell P, Sheehan MJ. Distinct evolutionary trajectories of V1R clades across mouse species. BMC Evol Biol 2020; 20:99. [PMID: 32770934 PMCID: PMC7414754 DOI: 10.1186/s12862-020-01662-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/21/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many animals rely heavily on olfaction to navigate their environment. Among rodents, olfaction is crucial for a wide range of social behaviors. The vomeronasal olfactory system in particular plays an important role in mediating social communication, including the detection of pheromones and recognition signals. In this study we examine patterns of vomeronasal type-1 receptor (V1R) evolution in the house mouse and related species within the genus Mus. We report the extent of gene repertoire turnover and conservation among species and clades, as well as the prevalence of positive selection on gene sequences across the V1R tree. By exploring the evolution of these receptors, we provide insight into the functional roles of receptor subtypes as well as the dynamics of gene family evolution. RESULTS We generated transcriptomes from the vomeronasal organs of 5 Mus species, and produced high quality V1R repertoires for each species. We find that V1R clades in the house mouse and relatives exhibit distinct evolutionary trajectories. We identify putative species-specific gene expansions, including a large clade D expansion in the house mouse. While gene gains are abundant, we detect very few gene losses. We describe a novel V1R clade and highlight candidate receptors for future study. We find evidence for distinct evolutionary processes across different clades, from largescale turnover to highly conserved repertoires. Patterns of positive selection are similarly variable, as some clades exhibit abundant positive selection while others display high gene sequence conservation. Based on clade-level evolutionary patterns, we identify receptor families that are strong candidates for detecting social signals and predator cues. Our results reveal clades with receptors detecting female reproductive status are among the most conserved across species, suggesting an important role in V1R chemosensation. CONCLUSION Analysis of clade-level evolution is critical for understanding species' chemosensory adaptations. This study provides clear evidence that V1R clades are characterized by distinct evolutionary trajectories. As receptor evolution is shaped by ligand identity, these results provide a framework for examining the functional roles of receptors.
Collapse
Affiliation(s)
| | - Polly Campbell
- Evolution, Ecology and Organismal Biology, University of California-Riverside, Riverside, USA
| | | |
Collapse
|
49
|
Prjibelski AD, Puglia GD, Antipov D, Bushmanova E, Giordano D, Mikheenko A, Vitale D, Lapidus A. Extending rnaSPAdes functionality for hybrid transcriptome assembly. BMC Bioinformatics 2020; 21:302. [PMID: 32703149 PMCID: PMC7379828 DOI: 10.1186/s12859-020-03614-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 06/18/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. RESULTS In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. CONCLUSION To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
| | - Giuseppe D Puglia
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Elena Bushmanova
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Daniela Giordano
- Department of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Domenico Vitale
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy
| | - Alla Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| |
Collapse
|
50
|
Mikheenko A, Bzikadze AV, Gurevich A, Miga KH, Pevzner PA. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 2020; 36:i75-i83. [PMID: 32657355 PMCID: PMC7355294 DOI: 10.1093/bioinformatics/btaa440] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there are no tools for their quality assessment. Moreover, since the mapping of error-prone reads to ETRs remains an open problem, it is not clear how to polish draft ETR assemblies. RESULTS To address these problems, we developed the TandemTools software that includes the TandemMapper tool for mapping reads to ETRs and the TandemQUAST tool for polishing ETR assemblies and their quality assessment. We demonstrate that TandemTools not only reveals errors in ETR assemblies but also improves the recently generated assemblies of human centromeres. AVAILABILITY AND IMPLEMENTATION https://github.com/ablab/TandemTools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|