251
|
Peng Z, He Y, Parajuli S, You Q, Wang W, Bhattarai K, Palmateer AJ, Deng Z. Integration of early disease-resistance phenotyping, histological characterization, and transcriptome sequencing reveals insights into downy mildew resistance in impatiens. HORTICULTURE RESEARCH 2021; 8:108. [PMID: 33931631 PMCID: PMC8087834 DOI: 10.1038/s41438-021-00543-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 03/11/2021] [Accepted: 03/22/2021] [Indexed: 05/11/2023]
Abstract
Downy mildew (DM), caused by obligate parasitic oomycetes, is a destructive disease for a wide range of crops worldwide. Recent outbreaks of impatiens downy mildew (IDM) in many countries have caused huge economic losses. A system to reveal plant-pathogen interactions in the early stage of infection and quickly assess resistance/susceptibility of plants to DM is desired. In this study, we established an early and rapid system to achieve these goals using impatiens as a model. Thirty-two cultivars of Impatiens walleriana and I. hawkeri were evaluated for their responses to IDM at cotyledon, first/second pair of true leaf, and mature plant stages. All I. walleriana cultivars were highly susceptible to IDM. While all I. hawkeri cultivars were resistant to IDM starting at the first true leaf stage, many (14/16) were susceptible to IDM at the cotyledon stage. Two cultivars showed resistance even at the cotyledon stage. Histological characterization showed that the resistance mechanism of the I. hawkeri cultivars resembles that in grapevine and type II resistance in sunflower. By integrating full-length transcriptome sequencing (Iso-Seq) and RNA-Seq, we constructed the first reference transcriptome for Impatiens comprised of 48,758 sequences with an N50 length of 2060 bp. Comparative transcriptome and qRT-PCR analyses revealed strong candidate genes for IDM resistance, including three resistance genes orthologous to the sunflower gene RGC203, a potential candidate associated with DM resistance. Our approach of integrating early disease-resistance phenotyping, histological characterization, and transcriptome analysis lay a solid foundation to improve DM resistance in impatiens and may provide a model for other crops.
Collapse
Affiliation(s)
- Ze Peng
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, 510642, Guangzhou, China
| | - Yanhong He
- Visiting scholar at University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
- Key Laboratory of Horticultural Plant Biology, Ministry of Education, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, 430070, Wuhan, Hubei, China
| | - Saroj Parajuli
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
| | - Qian You
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
| | - Weining Wang
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
| | - Krishna Bhattarai
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA
| | - Aaron J Palmateer
- University of Florida, IFAS, Department of Plant Pathology, Tropical Research and Education Center, 18905 S.W. 280th Street, Homestead, FL, 33031, USA
- Bayer Environmental Science US, 5000 Centregreen Way, Cary, NC, 27513, USA
| | - Zhanao Deng
- University of Florida, IFAS, Department of Environmental Horticulture, Gulf Coast Research and Education Center, 14625 County Road 672, Wimauma, FL, 33598, USA.
| |
Collapse
|
252
|
Yang C, Li X, Wang Q, Yuan H, Huang Y, Xiao H. Genome-wide analyses of the relict gull (Larus relictus): insights and evolutionary implications. BMC Genomics 2021; 22:311. [PMID: 33926388 PMCID: PMC8082828 DOI: 10.1186/s12864-021-07616-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 04/14/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The relict gull (Larus relictus), was classified as vulnerable on the IUCN Red List and is a first-class national protected bird in China. Genomic resources for L. relictus are lacking, which limits the study of its evolution and its conservation. RESULTS In this study, based on the Illumina and PacBio sequencing platforms, we successfully assembled the genome of L. relictus, one of the few known reference genomes in genus Larus. The size of the final assembled genome was 1.21 Gb, with a contig N50 of 8.11 Mb. A total of 18,454 genes were predicted from the assembly results, with 16,967 (91.94%) of these genes annotated. The genome contained 92.52 Mb of repeat sequence, accounting for 7.63% of the assembly. A phylogenetic tree was constructed using 4902 single-copy orthologous genes, which showed L. relictus had closest relative of L. smithsonianus, with divergence time of 14.7 Mya estimated between of them. PSMC analyses indicated that L. relictus had been undergoing a long-term population decline during 0.01-0.1 Mya with a small effective population size fom 8800 to 2200 individuals. CONCLUSIONS This genome will be a valuable genomic resource for a range of genomic and conservation studies of L. relictus and will help to establish a foundation for further studies investigating whether the breeding population is a complex population. As the species is threatened by habitat loss and fragmentation, actions to protect L. relictus are suggested to alleviate the fragmentation of breeding populations.
Collapse
Affiliation(s)
- Chao Yang
- College of Life Sciences, Shaanxi Normal University, Xi'an, 710062, China
- Shaanxi Institute of Zoology, Xi'an, 710032, China
| | - Xuejuan Li
- College of Life Sciences, Shaanxi Normal University, Xi'an, 710062, China
| | | | - Hao Yuan
- College of Life Sciences, Shaanxi Normal University, Xi'an, 710062, China
| | - Yuan Huang
- College of Life Sciences, Shaanxi Normal University, Xi'an, 710062, China.
| | - Hong Xiao
- Shaanxi Institute of Zoology, Xi'an, 710032, China.
| |
Collapse
|
253
|
Ramberg S, Høyheim B, Østbye TKK, Andreassen R. A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon. Front Genet 2021; 12:656334. [PMID: 33986770 PMCID: PMC8110904 DOI: 10.3389/fgene.2021.656334] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 04/01/2021] [Indexed: 12/18/2022] Open
Abstract
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
Collapse
Affiliation(s)
- Sigmund Ramberg
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| | - Bjørn Høyheim
- Department of Preclinical Sciences and Pathology, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, Ås, Norway
| | | | - Rune Andreassen
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| |
Collapse
|
254
|
Kolchanova S, Komissarov A, Kliver S, Mazo-Vargas A, Afanador Y, Velez-Valentín J, de la Rosa RV, Castro-Marquez S, Rivera-Colon I, Majeske AJ, Wolfsberger WW, Hains T, Corvelo A, Martinez-Cruzado JC, Glenn TC, Robinson O, Koepfli KP, Oleksyk TK. Molecular Phylogeny and Evolution of Amazon Parrots in the Greater Antilles. Genes (Basel) 2021; 12:608. [PMID: 33924228 PMCID: PMC8074781 DOI: 10.3390/genes12040608] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/13/2021] [Accepted: 04/16/2021] [Indexed: 01/10/2023] Open
Abstract
Amazon parrots (Amazona spp.) colonized the islands of the Greater Antilles from the Central American mainland, but there has not been a consensus as to how and when this happened. Today, most of the five remaining island species are listed as endangered, threatened, or vulnerable as a consequence of human activity. We sequenced and annotated full mitochondrial genomes of all the extant Amazon parrot species from the Greater Antillean (A. leucocephala (Cuba), A. agilis, A. collaria (both from Jamaica), A. ventralis (Hispaniola), and A. vittata (Puerto Rico)), A. albifrons from mainland Central America, and A. rhodocorytha from the Atlantic Forest in Brazil. The assembled and annotated mitogenome maps provide information on sequence organization, variation, population diversity, and evolutionary history for the Caribbean species including the critically endangered A. vittata. Despite the larger number of available samples from the Puerto Rican Parrot Recovery Program, the sequence diversity of the A. vittata population in Puerto Rico was the lowest among all parrot species analyzed. Our data support the stepping-stone dispersal and speciation hypothesis that has started approximately 3.47 MYA when the ancestral population arrived from mainland Central America and led to diversification across the Greater Antilles, ultimately reaching the island of Puerto Rico 0.67 MYA. The results are presented and discussed in light of the geological history of the Caribbean and in the context of recent parrot evolution, island biogeography, and conservation. This analysis contributes to understating evolutionary history and empowers subsequent assessments of sequence variation and helps design future conservation efforts in the Caribbean.
Collapse
Affiliation(s)
- Sofiia Kolchanova
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, 199034 St. Petersburg, Russia;
| | - Alexey Komissarov
- Applied Genomics Laboratory, SCAMT Institute, ITMO University, 191002 St. Petersburg, Russia;
| | - Sergei Kliver
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, 664033 Novosibirsk, Russia;
| | - Anyi Mazo-Vargas
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
| | - Yashira Afanador
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
| | - Jafet Velez-Valentín
- Conservation Program of the Puerto Rican Parrot, U.S. Fish and Wildlife Service, Rio Grande 00745, Puerto Rico;
| | - Ricardo Valentín de la Rosa
- The Recovery Program of the Puerto Rican Parrot at the Rio Abajo State Forest, Departamento de Recursos Naturales y Ambientales de Puerto Rico, Arecibo 00613, Puerto Rico;
| | - Stephanie Castro-Marquez
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
- Department of Biological Sciences, Oakland University, Rochester, MI 48307, USA
| | - Israel Rivera-Colon
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
| | - Audrey J. Majeske
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
- Department of Biological Sciences, Oakland University, Rochester, MI 48307, USA
| | - Walter W. Wolfsberger
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
- Department of Biological Sciences, Oakland University, Rochester, MI 48307, USA
- Department of Biology, Uzhhorod National University, 88000 Uzhhorod, Ukraine
| | - Taylor Hains
- Terra Wildlife Genomics, Washington, DC 20009, USA;
- Environmental Science and Policy, Johns Hopkins University, Washington, DC 20036, USA
| | | | - Juan-Carlos Martinez-Cruzado
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
| | - Travis C. Glenn
- Department of Environmental Health, The University of Georgia, Athens, GA 30602, USA;
| | | | - Klaus-Peter Koepfli
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, 199034 St. Petersburg, Russia;
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA 22630, USA
| | - Taras K. Oleksyk
- Biology Department, University of Puerto Rico at Mayagüez, Mayagüez 00682, Puerto Rico; (S.K.); (A.M.-V.); (Y.A.); (S.C.-M.); (I.R.-C.); (A.J.M.); (W.W.W.); (J.-C.M.-C.)
- Department of Biological Sciences, Oakland University, Rochester, MI 48307, USA
- Department of Biology, Uzhhorod National University, 88000 Uzhhorod, Ukraine
| |
Collapse
|
255
|
Jiao X, Shi J, Qin S, Huang D, Wang Y. Dataset of the transcriptomes of Urechis unicinctus to identify differentially expressed genes (DEGs) under different temperature and exposure to open air. Data Brief 2021; 35:106941. [PMID: 33842678 PMCID: PMC8020418 DOI: 10.1016/j.dib.2021.106941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/22/2021] [Accepted: 03/03/2021] [Indexed: 10/29/2022] Open
Abstract
Urechis unicinctus has a wide range of bioactive polypeptides with high edible, economic and medicinal values. As the key technical breakthrough, the artificial breeding is imperative. However, the seedling transport becomes a primary matter, which indicates the indispensability of realizing how Urechis unicinctus responses to various situations. We compared transcriptome of Urechis unicinctus under the dry and ultraviolet irradiation treatment and different temperature. The dataset of the organism in response to water-temperature variety was provided by using the Illumina Hiseq X Ten system, which will be helpful to understand the adaptation of Urechis unicinctus to changing temperature (low, high and room temperature) and open air (ultraviolet and desiccation). The assembly of the transcriptomes was carried out using the isoform sequencing (Iso-seq) method. The functions of expressed genes were annotated and categorized, while the DEGs were presented.
Collapse
Affiliation(s)
- Xudong Jiao
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| | - Jiaxin Shi
- College of Oceanic and Atmospheric Sciences, Ocean University of China, Qingdao 266000, China
| | - Song Qin
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| | - Dong Huang
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Agronomy College, Rudong University, Shandong, Yantai 264025, China
| | - Yinchu Wang
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| |
Collapse
|
256
|
Fernandez‐Pozo N, Metz T, Chandler JO, Gramzow L, Mérai Z, Maumus F, Mittelsten Scheid O, Theißen G, Schranz ME, Leubner‐Metzger G, Rensing SA. Aethionema arabicum genome annotation using PacBio full-length transcripts provides a valuable resource for seed dormancy and Brassicaceae evolution research. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 106:275-293. [PMID: 33453123 PMCID: PMC8641386 DOI: 10.1111/tpj.15161] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 12/31/2020] [Accepted: 01/08/2021] [Indexed: 05/06/2023]
Abstract
Aethionema arabicum is an important model plant for Brassicaceae trait evolution, particularly of seed (development, regulation, germination, dormancy) and fruit (development, dehiscence mechanisms) characters. Its genome assembly was recently improved but the gene annotation was not updated. Here, we improved the Ae. arabicum gene annotation using 294 RNA-seq libraries and 136 307 full-length PacBio Iso-seq transcripts, increasing BUSCO completeness by 11.6% and featuring 5606 additional genes. Analysis of orthologs showed a lower number of genes in Ae. arabicum than in other Brassicaceae, which could be partially explained by loss of homeologs derived from the At-α polyploidization event and by a lower occurrence of tandem duplications after divergence of Aethionema from the other Brassicaceae. Benchmarking of MADS-box genes identified orthologs of FUL and AGL79 not found in previous versions. Analysis of full-length transcripts related to ABA-mediated seed dormancy discovered a conserved isoform of PIF6-β and antisense transcripts in ABI3, ABI4 and DOG1, among other cases found of different alternative splicing between Turkey and Cyprus ecotypes. The presented data allow alternative splicing mining and proposition of numerous hypotheses to research evolution and functional genomics. Annotation data and sequences are available at the Ae. arabicum DB (https://plantcode.online.uni-marburg.de/aetar_db).
Collapse
Affiliation(s)
- Noe Fernandez‐Pozo
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
| | - Timo Metz
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
| | - Jake O. Chandler
- School of Biological SciencesRoyal Holloway University of LondonEghamSurreyUK
| | - Lydia Gramzow
- Matthias Schleiden Institute/GeneticsFriedrich Schiller University JenaJenaGermany
| | - Zsuzsanna Mérai
- Gregor Mendel Institute of Molecular Plant BiologyAustrian Academy of SciencesVienna BioCenter (VBC)ViennaAustria
| | | | - Ortrun Mittelsten Scheid
- Gregor Mendel Institute of Molecular Plant BiologyAustrian Academy of SciencesVienna BioCenter (VBC)ViennaAustria
| | - Günter Theißen
- Matthias Schleiden Institute/GeneticsFriedrich Schiller University JenaJenaGermany
| | - M. Eric Schranz
- Biosystematics GroupWageningen UniversityWageningenThe Netherlands
| | - Gerhard Leubner‐Metzger
- School of Biological SciencesRoyal Holloway University of LondonEghamSurreyUK
- Laboratory of Growth RegulatorsCentre of the Region Haná for Biotechnological and Agricultural ResearchPalacký University and Institute of Experimental BotanyAcademy of Sciences of the Czech RepublicOlomoucCzech Republic
| | - Stefan A. Rensing
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
- BIOSS Centre for Biological Signaling StudiesUniversity of FreiburgFreiburgGermany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO)University of MarburgMarburgGermany
| |
Collapse
|
257
|
Panthee S, Paudel A, Hamamoto H, Ogasawara AA, Iwasa T, Blom J, Sekimizu K. Complete genome sequence and comparative genomic analysis of Enterococcus faecalis EF-2001, a probiotic bacterium. Genomics 2021; 113:1534-1542. [PMID: 33771633 DOI: 10.1016/j.ygeno.2021.03.021] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 03/10/2021] [Accepted: 03/21/2021] [Indexed: 12/22/2022]
Abstract
Enterococcus faecalis is a common human gut commensal bacterium. While some E. faecalis strains are probiotic, others are known to cause opportunistic infections, and clear distinction between these strains is difficult using traditional taxonomic approaches. In this study, we completed the genome sequencing of EF-2001, a probiotic strain, using our in-house hybrid assembly approach. Comparative analysis showed that EF-2001 was devoid of cytolysins, major factors associated with pathogenesis, and was phylogenetically distant from pathogenic E. faecalis V583. Genomic analysis of strains with a publicly available complete genome sequence predicted that drug-resistance genes- dfrE, efrA, efrB, emeA, and lsaA were present in all strains, and EF-2001 lacked additional drug-resistance genes. Core- and pan-genome analyses revealed a higher degree of genomic fluidity. We found 49 genes specific to EF-2001, further characterization of which may provide insights into its diverse biological activities. Our comparative genomic analysis approach could help predict the pathogenic or probiotic potential of E. faecalis leading to an early distinction based on genome sequences.
Collapse
Affiliation(s)
- Suresh Panthee
- Teikyo University Institute of Medical Mycology, Hachioji, Otsuka 359, Tokyo 192-0395, Japan.
| | - Atmika Paudel
- Teikyo University Institute of Medical Mycology, Hachioji, Otsuka 359, Tokyo 192-0395, Japan; Division of Infection and Immunity, Research Center for Zoonosis Control, Hokkaido University, North 20, West 10, Kita-ku, Sapporo Hokkaido 001-0020, Japan.
| | - Hiroshi Hamamoto
- Teikyo University Institute of Medical Mycology, Hachioji, Otsuka 359, Tokyo 192-0395, Japan.
| | | | - Toshihiro Iwasa
- NIHON BERUMU CO., LTD., 2-14-3 Nagatacho, Chiyoda-ku, Tokyo 100-0014, Japan.
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus-Liebig-University Giessen, Giessen, Germany.
| | - Kazuhisa Sekimizu
- Teikyo University Institute of Medical Mycology, Hachioji, Otsuka 359, Tokyo 192-0395, Japan.
| |
Collapse
|
258
|
Rajewski A, Carter-House D, Stajich J, Litt A. Datura genome reveals duplications of psychoactive alkaloid biosynthetic genes and high mutation rate following tissue culture. BMC Genomics 2021; 22:201. [PMID: 33752605 PMCID: PMC7986286 DOI: 10.1186/s12864-021-07489-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 02/26/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Datura stramonium (Jimsonweed) is a medicinally and pharmaceutically important plant in the nightshade family (Solanaceae) known for its production of various toxic, hallucinogenic, and therapeutic tropane alkaloids. Recently, we published a tissue-culture based transformation protocol for D. stramonium that enables more thorough functional genomics studies of this plant. However, the tissue culture process can lead to undesirable phenotypic and genomic consequences independent of the transgene used. Here, we have assembled and annotated a draft genome of D. stramonium with a focus on tropane alkaloid biosynthetic genes. We then use mRNA sequencing and genome resequencing of transformants to characterize changes following tissue culture. RESULTS Our draft assembly conforms to the expected 2 gigabasepair haploid genome size of this plant and achieved a BUSCO score of 94.7% complete, single-copy genes. The repetitive content of the genome is 61%, with Gypsy-type retrotransposons accounting for half of this. Our gene annotation estimates the number of protein-coding genes at 52,149 and shows evidence of duplications in two key alkaloid biosynthetic genes, tropinone reductase I and hyoscyamine 6 β-hydroxylase. Following tissue culture, we detected only 186 differentially expressed genes, but were unable to correlate these changes in expression with either polymorphisms from resequencing or positional effects of transposons. CONCLUSIONS We have assembled, annotated, and characterized the first draft genome for this important model plant species. Using this resource, we show duplications of genes leading to the synthesis of the medicinally important alkaloid, scopolamine. Our results also demonstrate that following tissue culture, mutation rates of transformed plants are quite high (1.16 × 10- 3 mutations per site), but do not have a drastic impact on gene expression.
Collapse
Affiliation(s)
- Alex Rajewski
- Department of Botany and Plant Science, University of California, Riverside, California 92521 USA
| | - Derreck Carter-House
- Department of Microbiology and Plant Pathology, University of California, Riverside, California 92521 USA
| | - Jason Stajich
- Department of Microbiology and Plant Pathology, University of California, Riverside, California 92521 USA
| | - Amy Litt
- Department of Botany and Plant Science, University of California, Riverside, California 92521 USA
| |
Collapse
|
259
|
Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
260
|
Hou Z, Shi F, Ge S, Tao J, Ren L, Wu H, Zong S. Comparative transcriptome analysis of the newly discovered insect vector of the pine wood nematode in China, revealing putative genes related to host plant adaptation. BMC Genomics 2021; 22:189. [PMID: 33726671 PMCID: PMC7968331 DOI: 10.1186/s12864-021-07498-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 03/02/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND In many insect species, the larvae/nymphs are unable to disperse far from the oviposition site selected by adults. The Sakhalin pine sawyer Monochamus saltuarius (Gebler) is the newly discovered insect vector of the pine wood nematode (Bursaphelenchus xylophilus) in China. Adult M. saltuarius prefers to oviposit on the host plant Pinus koraiensis, rather than P. tabuliformis. However, the genetic basis of adaptation of the larvae of M. saltuarius with weaken dispersal ability to host environments selected by the adult is not well understood. RESULTS In this study, the free amino and fatty acid composition and content of the host plants of M. saltuarius larvae, i.e., P. koraiensis and P. tabuliformis were investigated. Compared with P. koraiensis, P. tabuliformis had a substantially higher content of various free amino acids, while the opposite trend was detected for fatty acid content. The transcriptional profiles of larval populations feeding on P. koraiensis and P. tabuliformis were compared using PacBio Sequel II sequencing combined with Illumina sequencing. The results showed that genes relating to digestion, fatty acid synthesis, detoxification, oxidation-reduction, and stress response, as well as nutrients and energy sensing ability, were differentially expressed, possibly reflecting adaptive changes of M. saltuarius in response to different host diets. Additionally, genes coding for cuticle structure were differentially expressed, indicating that cuticle may be a potential target for plant defense. Differential regulation of genes related to the antibacterial and immune response were also observed, suggesting that larvae of M. saltuarius may have evolved adaptations to cope with bacterial challenges in their host environments. CONCLUSIONS The present study provides comprehensive transcriptome resource of M. saltuarius relating to host plant adaptation. Results from this study help to illustrate the fundamental relationship between transcriptional plasticity and adaptation mechanisms of insect herbivores to host plants.
Collapse
Affiliation(s)
- Zehai Hou
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China
| | - Fengming Shi
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China
| | - Sixun Ge
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China
| | - Jing Tao
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China
| | - Lili Ren
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China
| | - Hao Wu
- Liaoning Provincial Key Laboratory of Dangerous Forest Pest Management and Control, Shenyang, China
| | - Shixiang Zong
- Key Laboratory of Beijing for the Control of Forest Pests, Beijing Forestry University, Beijing, China.
| |
Collapse
|
261
|
Park T, Wijeratne S, Meulia T, Firkins JL, Yu Z. The macronuclear genome of anaerobic ciliate Entodinium caudatum reveals its biological features adapted to the distinct rumen environment. Genomics 2021; 113:1416-1427. [PMID: 33722656 DOI: 10.1016/j.ygeno.2021.03.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 02/02/2021] [Accepted: 03/05/2021] [Indexed: 10/21/2022]
Abstract
Entodinium caudatum is an anaerobic binucleated ciliate representing the most dominant protozoal species in the rumen. However, its biological features are largely unknown due to the inability to establish an axenic culture. In this study, we primally sequenced its macronucleus (MAC) genome to aid the understanding of its metabolism, physiology, ecology. We isolated the MAC of E. caudatum strain MZG-1 and sequenced the MAC genome using Illumina MiSeq, MinION, and PacBio RSII systems. De novo assembly of the MiSeq sequence reads followed with subsequent scaffolding with MinION and PacBio reads resulted in a draft MAC genome about 117 Mbp. A large number of carbohydrate-active enzymes were likely acquired through horizontal gene transfer. About 8.74% of the E. caudatum predicted proteome was predicted as proteases. The MAC genome of E. caudatum will help better understand its important roles in rumen carbohydrate metabolism, and interaction with other members of the rumen microbiome.
Collapse
Affiliation(s)
- Tansol Park
- Department of Animal Sciences, The Ohio State University, Columbus, OH, 43210, USA
| | - Saranga Wijeratne
- Molecular and Cellular Imaging Center, Ohio Agricultural Research and Development Center, The Ohio State University, Wooster, OH, 44691, USA
| | - Tea Meulia
- Molecular and Cellular Imaging Center, Ohio Agricultural Research and Development Center, The Ohio State University, Wooster, OH, 44691, USA; Department of Plant Pathology, The Ohio State University, Wooster, OH, 44691, USA
| | - Jeffrey L Firkins
- Department of Animal Sciences, The Ohio State University, Columbus, OH, 43210, USA
| | - Zhongtang Yu
- Department of Animal Sciences, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
262
|
Broseus L, Thomas A, Oldfield AJ, Severac D, Dubois E, Ritchie W. TALC: Transcript-level Aware Long-read Correction. Bioinformatics 2021; 36:5000-5006. [PMID: 32910174 DOI: 10.1093/bioinformatics/btaa634] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/08/2020] [Accepted: 07/09/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous 'hybrid correction' algorithms have been developed for genomic data that correct long reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data. RESULTS We have created a novel reference-free algorithm called Transcript-level Aware Long-Read Correction (TALC) which models changes in RNA expression and isoform representation in a weighted De Bruijn graph to correct long reads from transcriptome studies. We show that transcript-level aware correction by TALC improves the accuracy of the whole spectrum of downstream RNA-seq applications and is thus necessary for transcriptome analyses that use long read technology. AVAILABILITY AND IMPLEMENTATION TALC is implemented in C++ and available at https://github.com/lbroseus/TALC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucile Broseus
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Aubin Thomas
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Andrew J Oldfield
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Dany Severac
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier Cedex 5 34094, France
| | - Emeric Dubois
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier Cedex 5 34094, France
| | - William Ritchie
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| |
Collapse
|
263
|
Chen X, Tong C, Zhang X, Song A, Hu M, Dong W, Chen F, Wang Y, Tu J, Liu S, Tang H, Zhang L. A high-quality Brassica napus genome reveals expansion of transposable elements, subgenome evolution and disease resistance. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:615-630. [PMID: 33073445 PMCID: PMC7955885 DOI: 10.1111/pbi.13493] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 09/21/2020] [Accepted: 10/13/2020] [Indexed: 05/03/2023]
Abstract
Rapeseed (Brassica napus L.) is a recent allotetraploid crop, which is well known for its high oil production. Here, we report a high-quality genome assembly of a typical semi-winter rapeseed cultivar, 'Zhongshuang11' (hereafter 'ZS11'), using a combination of single-molecule sequencing and chromosome conformation capture (Hi-C) techniques. Most of the high-confidence sequences (93.1%) were anchored to the individual chromosomes with a total of 19 centromeres identified, matching the exact chromosome count of B. napus. The repeat sequences in the A and C subgenomes in B. napus expanded significantly from 500 000 years ago, especially over the last 100 000 years. These young and recently amplified LTR-RTs showed dispersed chromosomal distribution but significantly preferentially clustered into centromeric regions. We exhaustively annotated the nucleotide-binding leucine-rich repeat (NLR) gene repertoire, yielding a total of 597 NLR genes in B. napus genome and 17.4% of which are paired (head-to-head arrangement). Based on the resequencing data of 991 B. napus accessions, we have identified 18 759 245 single nucleotide polymorphisms (SNPs) and detected a large number of genomic regions under selective sweep among the three major ecotype groups (winter, semi-winter and spring) in B. napus. We found 49 NLR genes and five NLR gene pairs colocated in selective sweep regions with different ecotypes, suggesting a rapid diversification of NLR genes during the domestication of B. napus. The high quality of our B. napus 'ZS11' genome assembly could serve as an important resource for the study of rapeseed genomics and reveal the genetic variations associated with important agronomic traits.
Collapse
Affiliation(s)
- Xuequn Chen
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
| | - Chaobo Tong
- The Key Laboratory of Biology and Genetic Improvement of Oil CropsThe Ministry of Agriculture and Rural Affairs of PRCOil Crops Research InstituteChinese Academy of Agricultural SciencesWuhanChina
| | - Xingtan Zhang
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
| | - Aixia Song
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
| | - Ming Hu
- The Key Laboratory of Biology and Genetic Improvement of Oil CropsThe Ministry of Agriculture and Rural Affairs of PRCOil Crops Research InstituteChinese Academy of Agricultural SciencesWuhanChina
| | - Wei Dong
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
| | - Fei Chen
- College of HorticultureNanjing Agricultural UniversityNanjingChina
| | - Youping Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of EducationYangzhou UniversityYangzhouChina
| | - Jinxing Tu
- National Key Laboratory of Crop Genetic ImprovementNational Center of Rapeseed ImprovementHuazhong Agricultural UniversityWuhanChina
| | - Shengyi Liu
- The Key Laboratory of Biology and Genetic Improvement of Oil CropsThe Ministry of Agriculture and Rural Affairs of PRCOil Crops Research InstituteChinese Academy of Agricultural SciencesWuhanChina
| | - Haibao Tang
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
| | - Liangsheng Zhang
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems BiologyKey Laboratory of Ministry of Education for Genetics & Breeding and Multiple Utilization of CropsCollege of AgricultureFujian Agriculture and Forestry UniversityFuzhouChina
- Genomics and Genetic Engineering Laboratory of Ornamental PlantsCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| |
Collapse
|
264
|
Liu T, Li M, Liu Z, Ai X, Li Y. Reannotation of the cultivated strawberry genome and establishment of a strawberry genome database. HORTICULTURE RESEARCH 2021; 8:41. [PMID: 33642572 PMCID: PMC7917095 DOI: 10.1038/s41438-021-00476-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 12/16/2020] [Accepted: 12/22/2020] [Indexed: 05/04/2023]
Abstract
Cultivated strawberry (Fragaria × ananassa) is an important fruit crop species whose fruits are enjoyed by many worldwide. An octoploid of hybrid origin, the complex genome of this species was recently sequenced, serving as a key reference genome for cultivated strawberry and related species of the Rosaceae family. The current annotation of the F. ananassa genome mainly relies on ab initio predictions and, to a lesser extent, transcriptome data. Here, we present the structure and functional reannotation of the F. ananassa genome based on one PacBio full-length RNA library and ninety-two Illumina RNA-Seq libraries. This improved annotation of the F. ananassa genome, v1.0.a2, comprises a total of 108,447 gene models, with 97.85% complete BUSCOs. The models of 19,174 genes were modified, 360 new genes were identified, and 11,044 genes were found to have alternatively spliced isoforms. Additionally, we constructed a strawberry genome database (SGD) for strawberry gene homolog searching and annotation downloading. Finally, the transcriptome of the receptacles and achenes of F. ananassa at four developmental stages were reanalyzed and qualified, and the expression profiles of all the genes in this annotation are also provided. Together, this study provides an updated annotation of the F. ananassa genome, which will facilitate genomic analyses across the Rosaceae family and gene functional studies in cultivated strawberry.
Collapse
Affiliation(s)
- Tianjia Liu
- Institute of Fruit and Tea, Hubei Academy of Agricultural Sciences/Fruit and Tea Subcenter of Hubei Innovation Center of Agricultural Science and Technology, Wuhan, China
| | - Muzi Li
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, College Park, MD, USA
| | - Zhongchi Liu
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, College Park, MD, USA
| | - Xiaoyan Ai
- Institute of Fruit and Tea, Hubei Academy of Agricultural Sciences/Fruit and Tea Subcenter of Hubei Innovation Center of Agricultural Science and Technology, Wuhan, China.
| | - Yongping Li
- School of Life Sciences and State Key Laboratory of Agrobiotechnology, Chinese University of Hong Kong, Shatin, Hong Kong, China.
| |
Collapse
|
265
|
Pei T, Yan M, Kong Y, Fan H, Liu J, Cui M, Fang Y, Ge B, Yang J, Zhao Q. The genome of Tripterygium wilfordii and characterization of the celastrol biosynthesis pathway. GIGABYTE 2021; 2021:gigabyte14. [PMID: 36967728 PMCID: PMC10038137 DOI: 10.46471/gigabyte.14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 02/25/2021] [Indexed: 11/09/2022] Open
Abstract
Tripterygium wilfordii is a vine from the Celastraceae family that is used in traditional Chinese medicine (TCM). The active ingredient, celastrol, is a friedelane-type pentacyclic triterpenoid with putative roles as an antitumor, immunosuppressive, and anti-obesity agent. Here, we report a reference genome assembly of T. wilfordii with high-quality annotation using a hybrid sequencing strategy. The total genome size obtained is 340.12 Mb, with a contig N50 value of 3.09 Mb. We successfully anchored 91.02% of sequences into 23 pseudochromosomes using high-throughput chromosome conformation capture (Hi–C) technology. The super-scaffold N50 value was 13.03 Mb. We also annotated 31,593 structural genes, with a repeat percentage of 44.31%. These data demonstrate that T. wilfordii diverged from Malpighiales species approximately 102.4 million years ago. By integrating genome, transcriptome and metabolite analyses, as well as in vivo and in vitro enzyme assays of two cytochrome P450 (CYP450) genes, TwCYP712K1 and TwCYP712K2, it is possible to investigate the second biosynthesis step of celastrol and demonstrate that this was derived from a common ancestor. These data provide insights and resources for further investigation of pathways related to celastrol, and valuable information to aid the conservation of resources, as well as understand the evolution of Celastrales.
Collapse
Affiliation(s)
- Tianlin Pei
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Yu Kong
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Hang Fan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Jie Liu
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Mengying Cui
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Yumin Fang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Binjie Ge
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Qing Zhao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
266
|
Yang M, Shang X, Zhou Y, Wang C, Wei G, Tang J, Zhang M, Liu Y, Cao J, Zhang Q. Full-Length Transcriptome Analysis of Plasmodium falciparum by Single-Molecule Long-Read Sequencing. Front Cell Infect Microbiol 2021; 11:631545. [PMID: 33708645 PMCID: PMC7942025 DOI: 10.3389/fcimb.2021.631545] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 01/05/2021] [Indexed: 11/25/2022] Open
Abstract
Malaria, an infectious disease caused by Plasmodium parasites, still accounts for amounts of deaths annually in last decades. Despite the significance of Plasmodium falciparum as a model organism of malaria parasites, our understanding of gene expression of this parasite remains largely elusive since lots of progress on its genome and transcriptome are based on assembly with short sequencing reads. Herein, we report the new version of transcriptome dataset containing all full-length transcripts over the whole asexual blood stages by adopting a full-length sequencing approach with optimized experimental conditions of cDNA library preparation. We have identified a total of 393 alternative splicing (AS) events, 3,623 long non-coding RNAs (lncRNAs), 1,555 alternative polyadenylation (APA) events, 57 transcription factors (TF), 1,721 fusion transcripts in P. falciparum. Furthermore, the shotgun proteome was performed to validate the full-length transcriptome of P. falciparum. More importantly, integration of full-length transcriptomic and proteomic data identified 160 novel small proteins in lncRNA regions. Collectively, this full-length transcriptome dataset with high quality and accuracy and the shotgun proteome analyses shed light on the complex gene expression in malaria parasites and provide a valuable resource for related functional and mechanistic researches on P. falciparum genes.
Collapse
Affiliation(s)
- Mengquan Yang
- Research Center for Translational Medicine, Key Laboratory of Arrhythmias of the Ministry of Education of China, East Hospital, Tongji University School of Medicine, Shanghai, China.,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,CAS Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xiaomin Shang
- Research Center for Translational Medicine, Key Laboratory of Arrhythmias of the Ministry of Education of China, East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Yiqing Zhou
- CAS Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Changhong Wang
- Research Center for Translational Medicine, Key Laboratory of Arrhythmias of the Ministry of Education of China, East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Guiying Wei
- Research Center for Translational Medicine, Key Laboratory of Arrhythmias of the Ministry of Education of China, East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jianxia Tang
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, China
| | - Meihua Zhang
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, China
| | - Yaobao Liu
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, China
| | - Jun Cao
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, China.,Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Qingfeng Zhang
- Research Center for Translational Medicine, Key Laboratory of Arrhythmias of the Ministry of Education of China, East Hospital, Tongji University School of Medicine, Shanghai, China
| |
Collapse
|
267
|
Li HD, Zhang W, Luo Y, Wang J. IsoDetect: Detection of Splice Isoforms from Third Generation Long Reads Based on Short Feature Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200316101205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Transcriptome annotation is the basis for understanding gene structures
and analysing gene expression. The transcriptome annotation of many organisms such as humans
is far from incomplete, due partly to the challenge in the identification of isoforms that are
produced from the same gene through alternative splicing. Third generation sequencing (TGS)
reads provide unprecedented opportunity for detecting isoforms due to their long length that
exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection
methods is that they are exclusively based on sequence reads, without incorporating the sequence
information of annotated isoforms.
Objective:
We aim to develop a method to detect isoforms by incorporating annotated isoforms.
Methods:
Based on annotated isoforms, we propose a splice isoform detection method called
IsoDetect. First, the sequence at exon-exon junctions is extracted from annotated isoforms as
“short feature sequences”, which is used to distinguish splice isoforms. Second, we align these
feature sequences to long reads and partition long reads into groups that contain the same set of
feature sequences, thereby avoiding the pair-wise comparison among the large number of long
reads. Third, clustering and consensus generation are carried out based on sequence similarity. For
the long reads that do not contain any short feature sequence, clustering analysis based on
sequence similarity is performed to identify isoforms. Therefore, our method can detect not only
known but also novel isoforms.
Result:
Tested on two datasets from Calypte anna and Zebra Finch, IsoDetect shows higher speed
and good accuracies compared with four existing methods.
Conclusion:
IsoDetect may become a promising method for isoform detection.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Wenjing Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yuwen Luo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
268
|
Chen Y, Wan S, Li Q, Dong X, Diao J, Liao Q, Wang GY, Gao ZX. Genome-Wide Integrated Analysis Revealed Functions of lncRNA-miRNA-mRNA Interaction in Growth of Intermuscular Bones in Megalobrama amblycephala. Front Cell Dev Biol 2021; 8:603815. [PMID: 33614620 PMCID: PMC7891300 DOI: 10.3389/fcell.2020.603815] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/23/2020] [Indexed: 12/16/2022] Open
Abstract
Intermuscular bone (IB) occurs in the myosepta of teleosts. Its existence has an adverse influence on the edible and economic value of fish, especially for aquaculture species belonging to Cypriniformes. The growth mechanism of IBs is quite lacking. In this study, we firstly used single molecular real-time sequencing (SMRT) technology to improve the draft genome annotation and full characterization of the transcriptome for one typical aquaculture species, blunt snout bream (Megalobrama amblycephala). The long non-coding RNA (lncRNA), microRNA (miRNA), and messenger RNA (mRNA) expression profiles in two IB growth stages (1 and 3 years old) were compared through transcriptome and degradome analyses. A total of 126 miRNAs, 403 mRNAs, and 353 lncRNAs were found to be differentially expressed between the two stages. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the significantly upregulated map2k6 and cytc in the MAPK/p53 signaling pathway and the significantly downregulated lama3 and thbs4b in the extracellular matrix (ECM)–receptor pathway may play a key regulatory role in IB growth. Bioinformatics analysis subsequently revealed 14 competing endogenous RNA (ceRNA) pairs related to the growth of IBs, consisting of 10 lncRNAs, 7 miRNAs, and 10 mRNAs. Of these, dre-miR-24b-3p and dre-miR-193b-3p are core regulatory factors interacting with four lncRNAs and three mRNAs, the interaction mechanism of which was also revealed by subsequent experiments at the cellular level. In conclusion, our data showed that IBs had higher activity of cell apoptosis and lower mineralization activity in IB_III compared to IB_I via interaction of MAPK/p53 and ECM–receptor signaling pathways. The downregulated zip1 interacted with miR-24a-3p and lnc017705, decreased osteoblast differentiation and Ca2+ deposition in the IB_III stage. Our identified functional mRNAs, lncRNAs, and miRNAs provide a data basis for in-depth elucidation of the growth mechanism of teleost IB.
Collapse
Affiliation(s)
- Yulong Chen
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China
| | - Shiming Wan
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China
| | - Qing Li
- Fisheries Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan Xianfeng Aquaculture Technology Co. Ltd, Wuhan, China
| | - Xiaoru Dong
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China
| | - Jinghan Diao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China
| | - Qing Liao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China
| | - Gui-Ying Wang
- Fisheries Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan Xianfeng Aquaculture Technology Co. Ltd, Wuhan, China
| | - Ze-Xia Gao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China.,Engineering Technology Research Center for Fish Breeding and Culture in Hubei Province, Wuhan, China
| |
Collapse
|
269
|
|
270
|
Identification of Dominant Transcripts in Oxidative Stress Response by a Full-Length Transcriptome Analysis. Mol Cell Biol 2021; 41:MCB.00472-20. [PMID: 33168698 DOI: 10.1128/mcb.00472-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open
Abstract
Our body responds to environmental stress by changing the expression levels of a series of cytoprotective enzymes/proteins through multilayered regulatory mechanisms, including the KEAP1-NRF2 system. While NRF2 upregulates the expression of many cytoprotective genes, there are fundamental limitations in short-read RNA sequencing (RNA-Seq), resulting in confusion regarding interpreting the effectiveness of cytoprotective gene induction at the transcript level. To precisely delineate isoform usage in the stress response, we conducted independent full-length transcriptome profiling (isoform sequencing; Iso-Seq) analyses of lymphoblastoid cells from three volunteers under normal and electrophilic stress-induced conditions. We first determined the first exon usage in KEAP1 and NFE2L2 (encoding NRF2) and found the presence of transcript diversity. We then examined changes in isoform usage of NRF2 target genes under stress conditions and identified a few isoforms dominantly expressed in the majority of NRF2 target genes. The expression levels of isoforms determined by Iso-Seq analyses showed striking differences from those determined by short-read RNA-Seq; the latter could be misleading concerning the abundance of transcripts. These results support that transcript usage is tightly regulated to produce functional proteins under electrophilic stress. Our present study strongly argues that there are important benefits that can be achieved by long-read transcriptome sequencing.
Collapse
|
271
|
Rautiainen M, Marschall T. MBG: Minimizer-based Sparse de Bruijn Graph Construction. Bioinformatics 2021; 37:2476-2478. [PMID: 33475133 PMCID: PMC8521641 DOI: 10.1093/bioinformatics/btab004] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/14/2020] [Accepted: 01/06/2021] [Indexed: 12/14/2022] Open
Abstract
Motivation De Bruijn graphs can be constructed from short reads efficiently and have
been used for many purposes. Traditionally, long-read sequencing
technologies have had too high error rates for de Bruijn graph-based
methods. Recently, HiFi reads have provided a combination of long-read
length and low error rate, which enables de Bruijn graphs to be used with
HiFi reads. Results We have implemented MBG, a tool for building sparse de Bruijn graphs from
HiFi reads. MBG outperforms existing tools for building dense de Bruijn
graphs and can build a graph of 50× coverage whole human genome HiFi
reads in four hours on a single core. MBG also assembles the bacterial
E.coli genome into a single contig in 8 s. Availability and implementation Package manager: https://anaconda.org/bioconda/mbg and source code: https://github.com/maickrau/MBG. Supplementary information Supplementary data
are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Center for Bioinformatics, Saarland
University, 66123 Saarbrücken, Germany
- Max Planck Institute for Informatics,
66123 Saarbrücken, Germany
- Saarbrücken Graduate School for Computer
Science, 66123 Saarbrücken, Germany
- To whom correspondence should be addressed.
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf,
Medical Faculty, Institute for Medical Biometry and
Bioinformatics, 40225 Düsseldorf, Germany
| |
Collapse
|
272
|
Gan W, Chung-Davidson YW, Chen Z, Song S, Cui W, He W, Zhang Q, Li W, Li M, Ren J. Global tissue transcriptomic analysis to improve genome annotation and unravel skin pigmentation in goldfish. Sci Rep 2021; 11:1815. [PMID: 33469041 PMCID: PMC7815744 DOI: 10.1038/s41598-020-80168-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 12/14/2020] [Indexed: 02/06/2023] Open
Abstract
Goldfish is an ornamental fish with diverse phenotypes. However, the limited genomic resources of goldfish hamper our understanding of the genetic basis for its phenotypic diversity. To provide enriched genomic resources and infer possible mechanisms underlying skin pigmentation, we performed a large-scale transcriptomic sequencing on 13 adult goldfish tissues, larvae at one- and three-days post hatch, and skin tissues with four different color pigmentation. A total of 25.52 Gb and 149.80 Gb clean data were obtained using the PacBio and Illumina platforms, respectively. Onto the goldfish reference genome, we mapped 137,674 non-redundant transcripts, of which 5.54% was known isoforms and 78.53% was novel isoforms of the reference genes, and the remaining 21,926 isoforms are novel isoforms of additional new genes. Both skin-specific and color-specific transcriptomic analyses showed that several significantly enriched genes were known to be involved in melanogenesis, tyrosine metabolism, PPAR signaling pathway, folate biosynthesis metabolism and so on. Thirteen differentially expressed genes across different color skins were associated with melanogenesis and pteridine synthesis including mitf, ednrb, mc1r, tyr, mlph and gch1, and xanthophore differentiation such as pax7, slc2a11 and slc2a15. These transcriptomic data revealed pathways involved in goldfish pigmentation and improved the gene annotation of the reference genome.
Collapse
Affiliation(s)
- Wu Gan
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China
| | - Yu-Wen Chung-Davidson
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824, USA
| | - Zelin Chen
- South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
| | - Shiying Song
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China
| | - Wenyao Cui
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China
| | - Wei He
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China
| | - Qinghua Zhang
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China
- International Research Center for Marine Biosciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Weiming Li
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824, USA
| | - Mingyou Li
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China.
| | - Jianfeng Ren
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, 201306, China.
- International Research Center for Marine Biosciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
273
|
Jin S, Bian C, Jiang S, Han K, Xiong Y, Zhang W, Shi C, Qiao H, Gao Z, Li R, Huang Y, Gong Y, You X, Fan G, Shi Q, Fu H. A chromosome-level genome assembly of the oriental river prawn, Macrobrachium nipponense. Gigascience 2021; 10:giaa160. [PMID: 33459341 PMCID: PMC7812440 DOI: 10.1093/gigascience/giaa160] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/01/2020] [Accepted: 12/14/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The oriental river prawn, Macrobrachium nipponense, is an economically important shrimp in China. Male prawns have higher commercial value than females because the former grow faster and reach larger sizes. It is therefore important to reveal sex-differentiation and development mechanisms of the oriental river prawn to enable genetic improvement. RESULTS We sequenced 293.3 Gb of raw Illumina short reads and 405.7 Gb of Pacific Biosciences long reads. The final whole-genome assembly of the Oriental river prawn was ∼4.5 Gb in size, with predictions of 44,086 protein-coding genes. A total of 49 chromosomes were determined, with an anchor ratio of 94.7% and a scaffold N50 of 86.8 Mb. A whole-genome duplication event was deduced to have happened 109.8 million years ago. By integration of genome and transcriptome data, 21 genes were predicted as sex-related candidate genes. CONCLUSION The first high-quality chromosome-level genome assembly of the oriental river prawn was obtained. These genomic data, along with transcriptome sequences, are essential for understanding sex-differentiation and development mechanisms in the oriental river prawn, as well as providing genetic resources for in-depth studies on developmental and evolutionary biology in arthropods.
Collapse
Affiliation(s)
- Shubo Jin
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Chao Bian
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Sufei Jiang
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Kai Han
- BGI-Qingdao, BGI-Shenzhen, Qingdao 266555, China
| | - Yiwei Xiong
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Wenyi Zhang
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | | | - Hui Qiao
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Zijian Gao
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Ruihan Li
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Yu Huang
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Yongsheng Gong
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Xinxin You
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao 266555, China
| | - Qiong Shi
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Hongtuo Fu
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| |
Collapse
|
274
|
He Z, Su Y, Wang T. Full-Length Transcriptome Analysis of Four Different Tissues of Cephalotaxus oliveri. Int J Mol Sci 2021; 22:ijms22020787. [PMID: 33466772 PMCID: PMC7830723 DOI: 10.3390/ijms22020787] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 01/02/2021] [Indexed: 02/07/2023] Open
Abstract
Cephalotaxus oliveri is a tertiary relict conifer endemic to China, regarded as a national second-level protected plant in China. This species has experienced severe changes in temperature and precipitation in the past millions of years, adapting well to harsh environments. In view of global climate change and its endangered conditions, it is crucial to study how it responds to changes in temperature and precipitation for its conservation work. In this study, single-molecule real-time (SMRT) sequencing and Illumina RNA sequencing were combined to generate the complete transcriptome of C. oliveri. Using the RNA-seq data to correct the SMRT sequencing data, the four tissues obtained 63,831 (root), 58,108 (stem), 33,013 (leaf) and 62,436 (male cone) full-length unigenes, with a N50 length of 2523, 3480, 3181, and 3267 bp, respectively. Additionally, 35,887, 11,306, 36,422, and 25,439 SSRs were detected for the male cone, leaf, root, and stem, respectively. The number of long non-coding RNAs predicted from the root was the largest (11,113), and the other tissues were 3408 (stem), 3193 (leaf), and 3107 (male cone), respectively. Functional annotation and enrichment analysis of tissue-specific expressed genes revealed the special roles in response to environmental stress and adaptability in the different four tissues. We also characterized the gene families and pathways related to abiotic factors. This work provides a comprehensive transcriptome resource for C. oliveri, and this resource will facilitate further studies on the functional genomics and adaptive evolution of C. oliveri.
Collapse
Affiliation(s)
- Ziqing He
- School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China;
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China;
- Research Institute of Sun Yat-sen University in Shenzhen, Shenzhen 518057, China
- Correspondence: (Y.S.); (T.W.); Tel.: +86-020-84111939 (Y.S.); +86-020-85280185 (T.W.)
| | - Ting Wang
- College of Life Sciences, South China Agricultural University, Guangzhou 510642, China
- Correspondence: (Y.S.); (T.W.); Tel.: +86-020-84111939 (Y.S.); +86-020-85280185 (T.W.)
| |
Collapse
|
275
|
Liu X, Li X, Wen X, Zhang Y, Ding Y, Zhang Y, Gao B, Zhang D. PacBio full-length transcriptome of wild apple (Malus sieversii) provides insights into canker disease dynamic response. BMC Genomics 2021; 22:52. [PMID: 33446096 PMCID: PMC7809858 DOI: 10.1186/s12864-021-07366-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 01/01/2021] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Valsa canker is a serious disease in the stem of Malus sieversii, caused by Valsa mali. However, little is known about the global response mechanism in M. sieversii to V. mali infection. RESULTS Phytohormone jasmonic acid (JA) and salicylic acid (SA) profiles and transcriptome analysis were used to elaborate on the dynamic response mechanism. We determined that the JA was initially produced to respond to the necrotrophic pathogen V. mali infection at the early response stage, then get synergistically transduced with SA to respond at the late response stage. Furthermore, we adopted Pacific Biosciences (PacBio) full-length sequencing to identify differentially expressed transcripts (DETs) during the canker response stage. We obtained 52,538 full-length transcripts, of which 8139 were DETs. Total 1336 lncRNAs, 23,737 alternative polyadenylation (APA) sites and 3780 putative transcription factors (TFs) were identified. Additionally, functional annotation analysis of DETs indicated that the wild apple response to the infection of V. mali involves plant-pathogen interaction, plant hormone signal transduction, flavonoid biosynthesis, and phenylpropanoid biosynthesis. The co-expression network of the differentially expressed TFs revealed 264 candidate TF transcripts. Among these candidates, the WRKY family was the most abundant. The MsWRKY7 and MsWRKY33 were highly correlated at the early response stage, and MsWRKY6, MsWRKY7, MsWRKY19, MsWRKY33, MsWRKY40, MsWRKY45, MsWRKY51, MsWRKY61, MsWRKY75 were highly correlated at the late stage. CONCLUSIONS The full-length transcriptomic analysis revealed a series of immune responsive events in M. sieversii in response to V. mali infection. The phytohormone signal pathway regulatory played an important role in the response stage. Additionally, the enriched disease resistance pathways and differentially expressed TFs dynamics collectively contributed to the immune response. This study provides valuable insights into a dynamic response in M. sieversii upon the necrotrophic pathogen V. mali infection, facilitates understanding of response mechanisms to canker disease for apple, and provides supports in the identification of potential resistance genes in M. sieversii.
Collapse
Affiliation(s)
- Xiaojie Liu
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoshuang Li
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,Turpan Eremophytes Botanical Garden, Chinese Academy of Sciences, Turpan, China
| | - Xuejing Wen
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,Turpan Eremophytes Botanical Garden, Chinese Academy of Sciences, Turpan, China
| | - Yan Zhang
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yu Ding
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | | | - Bei Gao
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China.,Turpan Eremophytes Botanical Garden, Chinese Academy of Sciences, Turpan, China
| | - Daoyuan Zhang
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China. .,Turpan Eremophytes Botanical Garden, Chinese Academy of Sciences, Turpan, China.
| |
Collapse
|
276
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
277
|
Morisse P, Marchet C, Limasset A, Lecroq T, Lefebvre A. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 2021; 11:761. [PMID: 33436980 PMCID: PMC7804095 DOI: 10.1038/s41598-020-80757-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/22/2020] [Indexed: 11/09/2022] Open
Abstract
Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .
Collapse
|
278
|
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, Halldorsson BV. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol 2021; 22:28. [PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Collapse
Affiliation(s)
| | | | | | - Peter L Møller
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Snædis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| |
Collapse
|
279
|
Tu Z, Shen Y, Wen S, Liu H, Wei L, Li H. A Tissue-Specific Landscape of Alternative Polyadenylation, lncRNAs, TFs, and Gene Co-expression Networks in Liriodendron chinense. FRONTIERS IN PLANT SCIENCE 2021; 12:705321. [PMID: 34367224 PMCID: PMC8343429 DOI: 10.3389/fpls.2021.705321] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 06/28/2021] [Indexed: 05/08/2023]
Abstract
Liriodendron chinense is an economically and ecologically important deciduous tree species. Although the reference genome has been revealed, alternative polyadenylation (APA), transcription factors (TFs), long non-coding RNAs (lncRNAs), and co-expression networks of tissue-specific genes remain incompletely annotated. In this study, we used the bracts, petals, sepals, stamens, pistils, leaves, and shoot apex of L. chinense as materials for hybrid sequencing. On the one hand, we improved the annotation of the genome. We detected 13,139 novel genes, 7,527 lncRNAs, 1,791 TFs, and 6,721 genes with APA sites. On the other hand, we found that tissue-specific genes play a significant role in maintaining tissue characteristics. In total, 2,040 tissue-specific genes were identified, among which 9.2% of tissue-specific genes were affected by APA, and 1,809 tissue-specific genes were represented in seven specific co-expression modules. We also found that bract-specific hub genes were associated plant defense, leaf-specific hub genes were involved in energy metabolism. Moreover, we also found that a stamen-specific hub TF Lchi25777 may be involved in the determination of stamen identity, and a shoot-apex-specific hub TF Lchi05072 may participate in maintaining meristem characteristic. Our study provides a landscape of APA, lncRNAs, TFs, and tissue-specific gene co-expression networks in L. chinense that will improve genome annotation, strengthen our understanding of transcriptome complexity, and drive further research into the regulatory mechanisms of tissue-specific genes.
Collapse
Affiliation(s)
- Zhonghua Tu
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
| | - Yufang Shen
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
| | - Shaoying Wen
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
| | - Huanhuan Liu
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
| | - Lingmin Wei
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
| | - Huogen Li
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
- *Correspondence: Huogen Li,
| |
Collapse
|
280
|
Xiong Y, Yu Q, Xiong Y, Zhao J, Lei X, Liu L, Liu W, Peng Y, Zhang J, Li D, Bai S, Ma X. The Complete Mitogenome of Elymus sibiricus and Insights Into Its Evolutionary Pattern Based on Simple Repeat Sequences of Seed Plant Mitogenomes. FRONTIERS IN PLANT SCIENCE 2021; 12:802321. [PMID: 35154192 PMCID: PMC8826237 DOI: 10.3389/fpls.2021.802321] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 12/27/2021] [Indexed: 05/11/2023]
Abstract
The most intriguing characteristics of plant mitochondrial genomes (mitogenomes) include their high variation in both sequence and structure, the extensive horizontal gene transfer (HGT), and the important role they play in hypoxic adaptation. However, the investigation of the mechanisms of hypoxic adaptation and HGT in plant mitochondria remains challenging due to the limited number of sequenced mitogenomes and non-coding nature of the transferred DNA. In this study, the mitogenome of Elymus sibiricus (Gramineae, Triticeae), a perennial grass species native to the Qinghai-Tibet plateau (QTP), was de novo assembled and compared with the mitogenomes of eight Gramineae species. The unique haplotype composition and higher TE content compared to three other Triticeae species may be attributed to the long-term high-altitude plateau adaptability of E. sibiricus. We aimed to discover the connection between mitogenome simple sequence repeats (SSRs) (mt-SSRs) and HGT. Therefore, we predicted and annotated the mt-SSRs of E. sibiricus along with the sequencing of 87 seed plants. The clustering result based on all of the predicted compound mitogenome SSRs (mt-c-SSRs) revealed an expected synteny within systematic taxa and also inter-taxa. The mt-c-SSRs were annotated to 11 genes, among which "(ATA)3agtcaagtcaag (AAT)3" occurred in the nad5 gene of 8 species. The above-mentioned results further confirmed the HGT of mitogenomes sequences even among distant species from the aspect of mt-c-SSRs. Two genes, nad4 and nad7, possessed a vast number of SSRs in their intron regions across the seed plant mitogenomes. Furthermore, five pairs of SSRs developed from the mitogenome of E. sibiricus could be considered as potential markers to distinguish between the species E. sibiricus and its related sympatric species E. nutans.
Collapse
Affiliation(s)
- Yanli Xiong
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Qingqing Yu
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Yi Xiong
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Junming Zhao
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Xiong Lei
- Sichuan Academy of Grassland Science, Chengdu, China
| | - Lin Liu
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Wei Liu
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Yan Peng
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Jianbo Zhang
- Sichuan Academy of Grassland Science, Chengdu, China
| | - Daxu Li
- Sichuan Academy of Grassland Science, Chengdu, China
| | - Shiqie Bai
- Sichuan Academy of Grassland Science, Chengdu, China
- *Correspondence: Shiqie Bai,
| | - Xiao Ma
- College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, China
- Xiao Ma,
| |
Collapse
|
281
|
Firtina C, Kim JS, Alser M, Senol Cali D, Cicek AE, Alkan C, Mutlu O. Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm. Bioinformatics 2020; 36:3669-3679. [PMID: 32167530 DOI: 10.1093/bioinformatics/btaa179] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 12/16/2019] [Accepted: 03/11/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. RESULTS We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/CMU-SAFARI/Apollo. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Can Firtina
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
| | - Jeremie S Kim
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland.,Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mohammed Alser
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
| | - Damla Senol Cali
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - A Ercument Cicek
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - Onur Mutlu
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland.,Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| |
Collapse
|
282
|
Zhang H, Jain C, Aluru S. A comprehensive evaluation of long read error correction methods. BMC Genomics 2020; 21:889. [PMID: 33349243 PMCID: PMC7751105 DOI: 10.1186/s12864-020-07227-0] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/12/2020] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. RESULTS In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. CONCLUSIONS Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .
Collapse
Affiliation(s)
- Haowen Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA. .,Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, 30332, GA, USA.
| |
Collapse
|
283
|
Xu D, Yang H, Zhuo Z, Lu B, Hu J, Yang F. Characterization and analysis of the transcriptome in Opisina arenosella from different developmental stages using single-molecule real-time transcript sequencing and RNA-seq. Int J Biol Macromol 2020; 169:216-227. [PMID: 33340629 DOI: 10.1016/j.ijbiomac.2020.12.098] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 09/10/2020] [Accepted: 12/12/2020] [Indexed: 02/06/2023]
Abstract
Opisina arenosella is one of the main pests harming coconut trees. To date, there have been few studies on the molecular genetics, biochemistry and physiology of O. arenosella at the transcriptional level, and there are no available reference genomes. Here, Illumina RNA sequencing combined with PacBio single-molecule real-time analysis was applied to study the transcriptome of this pest at different developmental stages, providing reference data for transcript expression analysis. Twelve samples of O. arenosella from different stages of development were sequenced using Illumina RNA sequencing, and the pooled RNA samples were sequenced with PacBio technology (Iso-Seq). A full-length transcriptome with 41,938 transcripts was captured, and the N50 and N90 lengths were 3543 bp and 1646 bp, respectively. A total of 36,925 transcripts were annotated in public databases, 6493 of which were long noncoding RNAs, while 2510 represented alternative splicing events. There were significant differences in the gene expression profiles at different developmental stages, with high levels of differential gene expression associated with growth, development, carbohydrate metabolism and immunity. This work provides resources and information for the study of the transcriptome and gene function of O. arenosella and provides a valuable foundation for understanding the changes in gene expression during development.
Collapse
Affiliation(s)
- Danping Xu
- College of Life Science, China West Normal University, Nanchong 637002, China
| | - Hongjun Yang
- College of Forestry, Hainan University, Haikou 570228, China
| | - Zhihang Zhuo
- College of Life Science, China West Normal University, Nanchong 637002, China; College of Forestry, Hainan University, Haikou 570228, China.
| | - Baoqian Lu
- Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China
| | - Jiameng Hu
- College of Forestry, Hainan University, Haikou 570228, China
| | - Fan Yang
- College of Forestry, Hainan University, Haikou 570228, China
| |
Collapse
|
284
|
Draft Genome of the Common Snapping Turtle, Chelydra serpentina, a Model for Phenotypic Plasticity in Reptiles. G3-GENES GENOMES GENETICS 2020; 10:4299-4314. [PMID: 32998935 PMCID: PMC7718744 DOI: 10.1534/g3.120.401440] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Turtles are iconic reptiles that inhabit a range of ecosystems from oceans to deserts and climates from the tropics to northern temperate regions. Yet, we have little understanding of the genetic adaptations that allow turtles to survive and reproduce in such diverse environments. Common snapping turtles, Chelydra serpentina, are an ideal model species for studying adaptation to climate because they are widely distributed from tropical to northern temperate zones in North America. They are also easy to maintain and breed in captivity and produce large clutch sizes, which makes them amenable to quantitative genetic and molecular genetic studies of traits like temperature-dependent sex determination. We therefore established a captive breeding colony and sequenced DNA from one female using both short and long reads. After trimming and filtering, we had 209.51Gb of Illumina reads, 25.72Gb of PacBio reads, and 21.72 Gb of Nanopore reads. The assembled genome was 2.258 Gb in size and had 13,224 scaffolds with an N50 of 5.59Mb. The longest scaffold was 27.24Mb. BUSCO analysis revealed 97.4% of core vertebrate genes in the genome. We identified 3.27 million SNPs in the reference turtle, which indicates a relatively high level of individual heterozygosity. We assembled the transcriptome using RNA-Seq data and used gene prediction software to produce 22,812 models of protein coding genes. The quality and contiguity of the snapping turtle genome is similar to or better than most published reptile genomes. The genome and genetic variants identified here provide a foundation for future studies of adaptation to climate.
Collapse
|
285
|
Zhou SY, Dong QL, Zhu KS, Gao L, Chen X, Xiang H. Long-read transcriptomic analysis of orb-weaving spider Araneus ventricosus indicates transcriptional diversity of spidroins. Int J Biol Macromol 2020; 168:395-402. [PMID: 33275979 DOI: 10.1016/j.ijbiomac.2020.11.182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 08/31/2020] [Accepted: 11/26/2020] [Indexed: 12/01/2022]
Abstract
Spider silk, which is composed of diverse silk proteins (spidroin), is a kind of natural high-mass biomaterial with great potential. However, due to the complexity of both the structure and the composition of the spidroins in natural spider silk, application of this valuable biomass is still limited to date. There are diverse kinds of spider silk in the orb-weaving spider with different mechanical and structural characteristics. In order to systematically illustrate the landscape of all the different spidrons, here we chose Araneus ventricosus, an orb-weaving spider with superior silk mechanical features and genome information, to generate a long-read whole body transcriptome. We deciphered the repeat arrangements of each kind of spidroin, based on which we found that there are substantially transcriptional diversity of each spidroin gene. Some repeat motifs are not documented before. Specifically, we discovered novel full-lengh MaSp transcript as well as a relatively small full-length AcSp isoforms, which are potential promising materials for bioengineering of recombinant spidroin. Our study provided a batch of new spidron resources with detail sequential information. The finding of transcriptional diversity may provide cues in understanding of within-species variation of the mechanical properties of the natural spider silk and further molecular designing of recombinant spidroin.
Collapse
Affiliation(s)
- Shi-Yi Zhou
- Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, Guangzhou Key Laboratory of Insect Development Regulation and Application Research, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Qing-Lin Dong
- State Key Laboratory of Molecular Engineering of Polymers, Laboratory of Advanced Materials and Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Ke-Sen Zhu
- Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, Guangzhou Key Laboratory of Insect Development Regulation and Application Research, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Lei Gao
- Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, Guangzhou Key Laboratory of Insect Development Regulation and Application Research, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou 510631, China.
| | - Xin Chen
- State Key Laboratory of Molecular Engineering of Polymers, Laboratory of Advanced Materials and Department of Macromolecular Science, Fudan University, Shanghai 200433, China.
| | - Hui Xiang
- Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, Guangzhou Key Laboratory of Insect Development Regulation and Application Research, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou 510631, China.
| |
Collapse
|
286
|
Steyaert A, Audenaert P, Fostier J. Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields. BMC Bioinformatics 2020; 21:402. [PMID: 32928110 PMCID: PMC7491180 DOI: 10.1186/s12859-020-03740-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 09/04/2020] [Indexed: 12/01/2022] Open
Abstract
Background De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times each k-mer (resp. k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. Results To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. Conclusions We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. True k-mers can be distinguished from erroneous k-mers with a higher F1 score than existing methods. A C++11 implementation is available at https://github.com/biointec/detoxunder the GNU AGPL v3.0 license.
Collapse
|
287
|
Deng N, Hou C, He B, Ma F, Song Q, Shi S, Liu C, Tian Y. A full-length transcriptome and gene expression analysis reveal genes and molecular elements expressed during seed development in Gnetum luofuense. BMC PLANT BIOLOGY 2020; 20:531. [PMID: 33228526 PMCID: PMC7685604 DOI: 10.1186/s12870-020-02729-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 10/31/2020] [Indexed: 05/07/2023]
Abstract
BACKGROUND Gnetum is an economically important tropical and subtropical gymnosperm genus with various dietary, industrial and medicinal uses. Many carbohydrates, proteins and fibers accumulate during the ripening of Gnetum seeds. However, the molecular mechanisms related to this process remain unknown. RESULTS We therefore assembled a full-length transcriptome from immature and mature G. luofuense seeds using PacBio sequencing reads. We identified a total of 5726 novel genes, 9061 alternative splicing events, 3551 lncRNAs, 2160 transcription factors, and we found that 8512 genes possessed at least one poly(A) site. In addition, gene expression comparisons of six transcriptomes generated by Illumina sequencing showed that 14,323 genes were differentially expressed from an immature stage to a mature stage with 7891 genes upregulated and 6432 genes downregulated. The expression of 14 differentially expressed transcription factors from the MADS-box, Aux/IAA and bHLH families was validated by qRT-PCR, suggesting that they may have important roles in seed ripening of G. luofuense. CONCLUSIONS These findings provide a valuable molecular resource for understanding seed development of gymnosperms.
Collapse
Affiliation(s)
- Nan Deng
- Hunan Academy of Forestry, Changsha, Hunan, No.658 Shaoshan Road, Tianxin District, Changsha, 410004, China
- Hunan Cili Forest Ecosystem State Research Station, Cili, Changsha, 410004, Hunan, China
| | - Chen Hou
- Guangdong Academy of Forestry, Guangzhou, 510520, China
- Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization, Guangdong Academy of Forestry, Guangzhou, 510520, China
| | - Boxiang He
- Guangdong Academy of Forestry, Guangzhou, 510520, China
- Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization, Guangdong Academy of Forestry, Guangzhou, 510520, China
| | - Fengfeng Ma
- Hunan Academy of Forestry, Changsha, Hunan, No.658 Shaoshan Road, Tianxin District, Changsha, 410004, China
- Hunan Cili Forest Ecosystem State Research Station, Cili, Changsha, 410004, Hunan, China
| | - Qingan Song
- Hunan Academy of Forestry, Changsha, Hunan, No.658 Shaoshan Road, Tianxin District, Changsha, 410004, China
- Hunan Cili Forest Ecosystem State Research Station, Cili, Changsha, 410004, Hunan, China
| | - Shengqing Shi
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, No. 1 Dongxiaofu, Xiangshan Road, Haidian, Beijing, 100091, China
| | - Caixia Liu
- Hunan Academy of Forestry, Changsha, Hunan, No.658 Shaoshan Road, Tianxin District, Changsha, 410004, China.
| | - Yuxin Tian
- Hunan Academy of Forestry, Changsha, Hunan, No.658 Shaoshan Road, Tianxin District, Changsha, 410004, China.
- Hunan Cili Forest Ecosystem State Research Station, Cili, Changsha, 410004, Hunan, China.
| |
Collapse
|
288
|
Liu J, Wang J, Xiao X, Lai X, Dai D, Zhang X, Zhu X, Zhao Z, Wang J, Li Z. A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model. BMC Genomics 2020; 21:753. [PMID: 33208104 PMCID: PMC7677778 DOI: 10.1186/s12864-020-07008-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages. Results In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods. Conclusions To verify the performance of our method, we selected Canu and Jabba to compare with QIHC in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. QIHC is far ahead of Jabba on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. QIHC outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between Canu and QIHC on the different error rates of the third generation sequencing data. QIHC still performs better. Therefore, QIHC is superior to the existing error correction methods when heterozygous sites exist.
Collapse
Affiliation(s)
- Jiaqi Liu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. .,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.
| | - Xiao Xiao
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,School of Public Policy and Administration, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xin Lai
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Daocheng Dai
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Zhongmeng Zhao
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Juan Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Annoroad Gene Institute, Beijing, 100176, China
| | - Zhimin Li
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. .,Annoroad Gene Institute, Beijing, 100176, China.
| |
Collapse
|
289
|
Zhang S, Liu Q, Lyu C, Chen J, Xiao R, Chen J, Yang Y, Zhang H, Hou K, Wu W. Characterizing glycosyltransferases by a combination of sequencing platforms applied to the leaf tissues of Stevia rebaudiana. BMC Genomics 2020; 21:794. [PMID: 33187479 PMCID: PMC7664074 DOI: 10.1186/s12864-020-07195-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 10/27/2020] [Indexed: 01/21/2023] Open
Abstract
Background Stevia rebaudiana (Bertoni) is considered one of the most valuable plants because of the steviol glycosides (SGs) that can be extracted from its leaves. Glycosyltransferases (GTs), which can transfer sugar moieties from activated sugar donors onto saccharide and nonsaccharide acceptors, are widely distributed in the genome of S. rebaudiana and play important roles in the synthesis of steviol glycosides. Results Six stevia genotypes with significantly different concentrations of SGs were obtained by induction through various mutagenic methods, and the contents of seven glycosides (stevioboside, Reb B, ST, Reb A, Reb F, Reb D and Reb M) in their leaves were considerably different. Then, NGS and single-molecule real-time (SMRT) sequencing were combined to analyse leaf tissue from these six different genotypes to generate a full-length transcriptome of S. rebaudiana. Two phylogenetic trees of glycosyltransferases (SrUGTs) were constructed by the neighbour-joining method and successfully predicted the functions of SrUGTs involved in SG biosynthesis. With further insight into glycosyltransferases (SrUGTs) involved in SG biosynthesis, the weighted gene co-expression network analysis (WGCNA) method was used to characterize the relationships between SrUGTs and SGs, and forty-four potential SrUGTs were finally obtained, including SrUGT85C2, SrUGT74G1, SrUGT76G1 and SrUGT91D2, which have already been reported to be involved in the glucosylation of steviol glycosides, illustrating the reliability of our results. Conclusion Combined with the results obtained by previous studies and those of this work, we systematically characterized glycosyltransferases in S. rebaudiana and forty-four candidate SrUGTs involved in the glycosylation of steviol glucosides were obtained. Moreover, the full-length transcriptome obtained in this study will provide valuable support for further research investigating S. rebaudiana. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07195-5.
Collapse
Affiliation(s)
- Shaoshan Zhang
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China.,Institute of Qinghai-Tibetan Plateau, Southwest Minzu University, Chengdu, 610041, China
| | - Qiong Liu
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Chengcheng Lyu
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jinsong Chen
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Renfeng Xiao
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jingtian Chen
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yunshu Yang
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Huihui Zhang
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Kai Hou
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China
| | - Wei Wu
- Agronomy College, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
290
|
Yang H, Xu D, Zhuo Z, Hu J, Lu B. Transcriptome and gene expression analysis of Rhynchophorus ferrugineus (Coleoptera: Curculionidae) during developmental stages. PeerJ 2020; 8:e10223. [PMID: 33194414 PMCID: PMC7643551 DOI: 10.7717/peerj.10223] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 09/29/2020] [Indexed: 01/15/2023] Open
Abstract
Background Red palm weevil, Rhynchophorus ferrugineus Olivier, is one of the most destructive pests harming palm trees. However, genomic resources for R. ferrugineus are still lacking, limiting the ability to discover molecular and genetic means of pest control. Methods In this study, PacBio Iso-Seq and Illumina RNA-seq were used to generate transcriptome from three developmental stages of R. ferrugineus (pupa, 7th-instar larva, adult) to increase the understanding of the life cycle and molecular characteristics of the pest. Results Sequencing generated 625,983,256 clean reads, from which 63,801 full-length transcripts were assembled with N50 of 3,547 bp. Expression analyses revealed 8,583 differentially expressed genes (DEGs). Moreover, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis revealed that these DEGs were mainly related to the peroxisome pathway which associated with metabolic pathways, material transportation and organ tissue formation. In summary, this work provides a valuable basis for further research on the growth and development, gene expression and gene prediction, and pest control of R. ferrugineus.
Collapse
Affiliation(s)
- Hongjun Yang
- College of Life Science, China West Normal University, Nanchong, Sichuan, China.,Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, Ministry of Education, Key Laboratory of Germplasm Resources Biology of Tropical Special Ornamental Plants of Hainan Province, College of Forestry, Hainan University, Haikou, Hainan,China
| | - Danping Xu
- College of Life Science, China West Normal University, Nanchong, Sichuan, China
| | - Zhihang Zhuo
- College of Life Science, China West Normal University, Nanchong, Sichuan, China.,Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, Ministry of Education, Key Laboratory of Germplasm Resources Biology of Tropical Special Ornamental Plants of Hainan Province, College of Forestry, Hainan University, Haikou, Hainan,China.,Key Laboratory of Integrated Pest Management on Crops in South China, Ministry of Agriculture, South China Agricultural University, Guangzhou, Guangdong, China
| | - Jiameng Hu
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, Ministry of Education, Key Laboratory of Germplasm Resources Biology of Tropical Special Ornamental Plants of Hainan Province, College of Forestry, Hainan University, Haikou, Hainan,China
| | - Baoqian Lu
- Key Laboratory of Integrated Pest Management on Tropical Crops, Ministry of Agriculture China, Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan, China
| |
Collapse
|
291
|
Zheng J, Wang P, Mao Y, Su Y, Wang J. Full-length transcriptome analysis provides new insights into the innate immune system of Marsupenaeus japonicus. FISH & SHELLFISH IMMUNOLOGY 2020; 106:283-295. [PMID: 32755684 DOI: 10.1016/j.fsi.2020.07.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 07/06/2020] [Accepted: 07/09/2020] [Indexed: 06/11/2023]
Abstract
As invertebrates, shrimp are generally thought to solely rely on their innate immune system to combat invading pathogens. Recently, an increasing number of studies have revealed that the innate immune response of invertebrates exhibits diversity and specificity based on their diverse immune molecules. Herein, a full-length transcriptome analysis of several immune-related tissues (hepatopancreas, gill, hemocytes, stomach and intestine) in the kuruma shrimp (Marsupenaeus japonicus) was conducted to identify immune-related molecules with a focus on transcript variations. In total, 11,222 nonredundant full-length transcripts with an N50 length of 5174 were obtained, and most of these transcripts (94.84%) were successfully annotated. In addition, a total of 147 long noncoding RNAs (lncRNAs) were also predicted. Importantly, transcript variants of several vital immune-related genes were observed, including twenty-five alpha-2-macroglobulins (α2-Ms), ten Toll-like receptors (TLRs), six C-type lectins (CTLs), five M-type lectins (MTLs) and three Down syndrome cell adhesion molecules (Dscams). Furthermore, 509 nonredundant full-length transcripts were predicted to be generated from alternative splicing (AS) events, which contribute to the diversity of immune molecules. Overall, our study provides valuable data on the full-length transcripts of M. japonicus, which will facilitate the exploration of immune molecules in this species. Moreover, numerous transcript variants of immune molecules detected in this study provide clues for further investigating the diversity and specificity of the innate immune response in shrimp.
Collapse
Affiliation(s)
- Jinbin Zheng
- School of Marine Sciences, Ningbo University, Ningbo, 315211, China
| | - Panpan Wang
- Jiangsu Key Laboratory of Marine Biotechnology, Jiangsu Ocean University, Lianyungang, 222005, China
| | - Yong Mao
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen, 361102, China.
| | - Yongquan Su
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, 361102, China
| | - Jun Wang
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, 361102, China
| |
Collapse
|
292
|
Kuo RI, Cheng Y, Zhang R, Brown JWS, Smith J, Archibald AL, Burt DW. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genomics 2020; 21:751. [PMID: 33126848 PMCID: PMC7596999 DOI: 10.1186/s12864-020-07123-7] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/06/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The human transcriptome annotation is regarded as one of the most complete of any eukaryotic species. However, limitations in sequencing technologies have biased the annotation toward multi-exonic protein coding genes. Accurate high-throughput long read transcript sequencing can now provide additional evidence for rare transcripts and genes such as mono-exonic and non-coding genes that were previously either undetectable or impossible to differentiate from sequencing noise. RESULTS We developed the Transcriptome Annotation by Modular Algorithms (TAMA) software to leverage the power of long read transcript sequencing and address the issues with current data processing pipelines. TAMA achieved high sensitivity and precision for gene and transcript model predictions in both reference guided and unguided approaches in our benchmark tests using simulated Pacific Biosciences (PacBio) and Nanopore sequencing data and real PacBio datasets. By analyzing PacBio Sequel II Iso-Seq sequencing data of the Universal Human Reference RNA (UHRR) using TAMA and other commonly used tools, we found that the convention of using alignment identity to measure error correction performance does not reflect actual gain in accuracy of predicted transcript models. In addition, inter-read error correction can cause major changes to read mapping, resulting in potentially over 6 K erroneous gene model predictions in the Iso-Seq based human genome annotation. Using TAMA's genome assembly based error correction and gene feature evidence, we predicted 2566 putative novel non-coding genes and 1557 putative novel protein coding gene models. CONCLUSIONS Long read transcript sequencing data has the power to identify novel genes within the highly annotated human genome. The use of parameter tuning and extensive output information of the TAMA software package allows for in depth exploration of eukaryotic transcriptomes. We have found long read data based evidence for thousands of unannotated genes within the human genome. More development in sequencing library preparation and data processing are required for differentiating sequencing noise from real genes in long read RNA sequencing data.
Collapse
Affiliation(s)
- Richard I Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK.
| | - Yuanyuan Cheng
- The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, UK
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, UK
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - Alan L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
- The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
293
|
The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome. PLoS One 2020; 15:e0240935. [PMID: 33119641 PMCID: PMC7595290 DOI: 10.1371/journal.pone.0240935] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 10/06/2020] [Indexed: 12/12/2022] Open
Abstract
Sockeye salmon (Oncorhynchus nerka) is a commercially and culturally important species to the people that live along the northern Pacific Ocean coast. There are two main sockeye salmon ecotypes—the ocean-going (anadromous) ecotype and the fresh-water ecotype known as kokanee. The goal of this study was to better understand the population structure of sockeye salmon and identify possible genomic differences among populations and between the two ecotypes. In pursuit of this goal, we generated the first reference sockeye salmon genome assembly and an RNA-seq transcriptome data set to better annotate features of the assembly. Resequenced whole-genomes of 140 sockeye salmon and kokanee were analyzed to understand population structure and identify genomic differences between ecotypes. Three distinct geographic and genetic groups were identified from analyses of the resequencing data. Nucleotide variants in an immunoglobulin heavy chain variable gene cluster on chromosome 26 were found to differentiate the northwestern group from the southern and upper Columbia River groups. Several candidate genes were found to be associated with the kokanee ecotype. Many of these genes were related to ammonia tolerance or vision. Finally, the sex chromosomes of this species were better characterized, and an alternative sex-determination mechanism was identified in a subset of upper Columbia River kokanee.
Collapse
|
294
|
Genetic Determinants of Resistance to Extended-Spectrum Cephalosporin and Fluoroquinolone in Escherichia coli Isolated from Diseased Pigs in the United States. mSphere 2020; 5:5/5/e00990-20. [PMID: 33115839 PMCID: PMC8534314 DOI: 10.1128/msphere.00990-20] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Fluoroquinolones and cephalosporins are critically important antimicrobial classes for both human and veterinary medicine. We previously found a drastic increase in enrofloxacin resistance in clinical Escherichia coli isolates collected from diseased pigs from the United States over 10 years (2006 to 2016). However, the genetic determinants responsible for this increase have yet to be determined. The aim of the present study was to identify and characterize the genetic basis of resistance against fluoroquinolones (enrofloxacin) and extended-spectrum cephalosporins (ceftiofur) in swine E. coli isolates using whole-genome sequencing (WGS). blaCMY-2 (carried by IncA/C2, IncI1, and IncI2 plasmids), blaCTX-M (carried by IncF, IncHI2, and IncN plasmids), and blaSHV-12 (carried by IncHI2 plasmids) genes were present in 87 (82.1%), 19 (17.9%), and 3 (2.83%) of the 106 ceftiofur-resistant isolates, respectively. Of the 110 enrofloxacin-resistant isolates, 90 (81.8%) had chromosomal mutations in gyrA, gyrB, parA, and parC genes. Plasmid-mediated quinolone resistance genes [qnrB77, qnrB2, qnrS1, qnrS2, and aac-(6)-lb′-cr] borne on ColE, IncQ2, IncN, IncF, and IncHI2 plasmids were present in 24 (21.8%) of the enrofloxacin-resistant isolates. Virulent IncF plasmids present in swine E. coli isolates were highly similar to epidemic plasmids identified globally. High-risk E. coli clones, such as ST744, ST457, ST131, ST69, ST10, ST73, ST410, ST12, ST127, ST167, ST58, ST88, ST617, ST23, etc., were also found in the U.S. swine population. Additionally, the colistin resistance gene (mcr-9) was present in several isolates. This study adds valuable information regarding resistance to critical antimicrobials with implications for both animal and human health. IMPORTANCE Understanding the genetic mechanisms conferring resistance is critical to design informed control and preventive measures, particularly when involving critically important antimicrobial classes such as extended-spectrum cephalosporins and fluoroquinolones. The genetic determinants of extended-spectrum cephalosporin and fluoroquinolone resistance were highly diverse, with multiple plasmids, insertion sequences, and genes playing key roles in mediating resistance in swine Escherichia coli. Plasmids assembled in this study are known to be disseminated globally in both human and animal populations and environmental samples, and E. coli in pigs might be part of a global reservoir of key antimicrobial resistance (AMR) elements. Virulent plasmids found in this study have been shown to confer fitness advantages to pathogenic E. coli strains. The presence of international, high-risk zoonotic clones provides worrisome evidence that resistance in swine isolates may have indirect public health implications, and the swine population as a reservoir for these high-risk clones should be continuously monitored.
Collapse
|
295
|
Grabski DF, Broseus L, Kumari B, Rekosh D, Hammarskjold ML, Ritchie W. Intron retention and its impact on gene expression and protein diversity: A review and a practical guide. WILEY INTERDISCIPLINARY REVIEWS-RNA 2020; 12:e1631. [PMID: 33073477 DOI: 10.1002/wrna.1631] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/16/2020] [Accepted: 09/23/2020] [Indexed: 12/12/2022]
Abstract
Intron retention (IR) occurs when a complete and unspliced intron remains in mature mRNA. An increasing body of literature has demonstrated a major role for IR in numerous biological functions, including several that impact human health and disease. Although experimental technologies used to study other forms of mRNA splicing can also be used to investigate IR, a specialized downstream computational analysis is optimal for IR discovery and analysis. Here we provide a review of IR and its biological implications, as well as a practical guide for how to detect and analyze it. Several methods, including long read third generation direct RNA sequencing, are described. We have developed an R package, FakIR, to facilitate the execution of the bioinformatic tasks recommended in this review and a tutorial on how to fit them to users aims. Additionally, we provide guidelines and experimental protocols to validate IR discovery and to evaluate the potential impact of IR on gene expression and protein output. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Processing > Splicing Regulation/Alternative Splicing RNA Methods > RNA Analyses in vitro and In Silico.
Collapse
Affiliation(s)
- David F Grabski
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, Virginia, USA.,Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA
| | - Lucile Broseus
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - Bandana Kumari
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - David Rekosh
- Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA.,Department of Microbiology, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - Marie-Louise Hammarskjold
- Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA.,Department of Microbiology, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - William Ritchie
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| |
Collapse
|
296
|
Li L, Liu H, Wen W, Huang C, Li X, Xiao S, Wu M, Shi J, Xu D. Full Transcriptome Analysis of Callus Suspension Culture System of Bletilla striata. Front Genet 2020; 11:995. [PMID: 33193583 PMCID: PMC7593603 DOI: 10.3389/fgene.2020.00995] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 08/05/2020] [Indexed: 12/13/2022] Open
Abstract
Background Bletilla striata has been widely used in the pharmacology industry. To effectively produce the secondary metabolites through suspension cultured cells of B. striata, it is important to exploring the full-length transcriptome data and the genes related to cell growth and chemical producing of all culture stages. We applied a combination of Real-Time Sequencing of Single Molecule (SMRT) and second-generation sequencing (SGS) to generate the complete and full-length transcriptome of B. striata suspension cultured cells. Methods The B. striata transcriptome was formed in de novo way by using PacBio isoform sequencing (Iso-Seq) on a pooled RNA sample derived from 23 samples of 10 culture stages, to explore the potential for capturing full-length transcript isoforms. All unigenes were obtained after splicing, assembling, and clustering, and corrected by the SGS results. The obtained unigenes were compared with the databases, and the functions were annotated and classified. Results and conclusions A total of 100,276 high-quality full-length transcripts were obtained, with an average length of 2530 bp and an N50 of 3302 bp. About 52% of total sequences were annotated against the Gene Ontology, 53,316 unigenes were hit by KOG annotations and divided into 26 functional categories, 80,020 unigenes were mapped by KEGG annotations and clustered into 363 pathways. Furthermore, 15,133 long-chain non-coding RNAs (lncRNAs) were detected. And 68,996 coding sequences were identified based on SSR analysis, among which 31 pairs of primers selected at random were amplified and obtained stable bands. In conclusion, our results provide new full-length transcriptome data and genetic resources for identifying growth and metabolism-related genes, which provide a solid foundation for further research on its growth regulation mechanisms and genetic engineering breeding mechanisms of B. striata.
Collapse
Affiliation(s)
- Lin Li
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| | - Houbo Liu
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| | - Weie Wen
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| | - Ceyin Huang
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| | - Xiaomei Li
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| | - Shiji Xiao
- School of Pharmacy, Zunyi Medical University, Zunyi, China
| | - Mingkai Wu
- Institute of Modern Chinese Herbal of Guizhou Academy of Agricultural Sciences, Guiyang, China
| | - Junhua Shi
- The Department of Imaging, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Delin Xu
- Department of Cell Biology, Zunyi Medical University, Zunyi, China
| |
Collapse
|
297
|
Chromosome-Scale Assembly and Annotation of the Macadamia Genome ( Macadamia integrifolia HAES 741). G3-GENES GENOMES GENETICS 2020; 10:3497-3504. [PMID: 32747341 PMCID: PMC7534425 DOI: 10.1534/g3.120.401326] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Macadamia integrifolia is a representative of the large basal eudicot family Proteaceae and the main progenitor species of the Australian native nut crop macadamia. Since its commercialisation in Hawaii fewer than 100 years ago, global production has expanded rapidly. However, genomic resources are limited in comparison to other horticultural crops. The first draft assembly of M. integrifolia had good coverage of the functional gene space but its high fragmentation has restricted its use in comparative genomics and association studies. Here we have generated an improved assembly of cultivar HAES 741 (4,094 scaffolds, 745 Mb, N50 413 kb) using a combination of Illumina paired and PacBio long read sequences. Scaffolds were anchored to 14 pseudo-chromosomes using seven genetic linkage maps. This assembly has improved contiguity and coverage, with >120 Gb of additional sequence. Following annotation, 34,274 protein-coding genes were predicted, representing 90% of the expected gene content. Our results indicate that the macadamia genome is repetitive and heterozygous. The total repeat content was 55% and genome-wide heterozygosity, estimated by read mapping, was 0.98% or an average of one SNP per 102 bp. This is the first chromosome-scale genome assembly for macadamia and the Proteaceae. It is expected to be a valuable resource for breeding, gene discovery, conservation and evolutionary genomics.
Collapse
|
298
|
Chen D, Du Y, Fan X, Zhu Z, Jiang H, Wang J, Fan Y, Chen H, Zhou D, Xiong C, Zheng Y, Xu X, Luo Q, Guo R. Reconstruction and functional annotation of Ascosphaera apis full-length transcriptome utilizing PacBio long reads combined with Illumina short reads. J Invertebr Pathol 2020; 176:107475. [PMID: 32976816 DOI: 10.1016/j.jip.2020.107475] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 08/26/2020] [Accepted: 09/16/2020] [Indexed: 01/19/2023]
Abstract
Ascosphaera apis is a widespread fungal pathogen of honeybee larvae that results in chalkbrood disease, leading to heavy losses for the beekeeping industry in China and many other countries. This work was aimed at generating a full-length transcriptome of A. apis using PacBio single-molecule real-time (SMRT) sequencing. Here, more than 23.97 Gb of clean reads was generated from long-read sequencing of A. apis mycelia, including 464,043 circular consensus sequences (CCS) and 394,142 full-length non-chimeric (FLNC) reads. In total, we identified 174,095 high-confidence transcripts covering 5141 known genes with an average length of 2728 bp. We also discovered 2405 genic loci and 11,623 isoforms that have not been annotated yet within the current reference genome. Additionally, 16,049, 10,682, 4520 and 7253 of the discovered transcripts have annotations in the Non-redundant protein (Nr), Clusters of Eukaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, 1205 long non-coding RNAs (lncRNAs) were identified, which have less exons, shorter exon and intron lengths, shorter transcript lengths, lower GC percent, lower expression levels, and fewer alternative splicing (AS) evens, compared with protein-coding transcripts. A total of 253 members from 17 transcription factor (TF) families were identified from our transcript datasets. Finally, the expression of A. apis isoforms was validated using a molecular approach. Overall, this is the first report of a full-length transcriptome of entomogenous fungi including A. apis. Our data offer a comprehensive set of reference transcripts and hence contributes to improving the genome annotation and transcriptomic study of A. apis.
Collapse
Affiliation(s)
- Dafu Chen
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Yu Du
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Xiaoxue Fan
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Zhiwei Zhu
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Haibin Jiang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Jie Wang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Yuanchan Fan
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Huazhi Chen
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Dingding Zhou
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Cuiling Xiong
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Yanzhen Zheng
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China
| | - Xijian Xu
- Jiangxi Province Institute of Apiculture, 330201 Nanchang, Jiangxi, China
| | - Qun Luo
- Jiangxi Province Institute of Apiculture, 330201 Nanchang, Jiangxi, China
| | - Rui Guo
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, 350002 Fuzhou, Fujian, China; Engineering Research Center of Processing and Application of Bee Products of Ministry of Education, Fuzhou 350002, Fujian Province, China.
| |
Collapse
|
299
|
Rautiainen M, Marschall T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020; 21:253. [PMID: 32972461 PMCID: PMC7513500 DOI: 10.1186/s13059-020-02157-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 08/26/2020] [Indexed: 02/07/2023] Open
Abstract
Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany.
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, 66123, Germany.
- Saarbrücken Graduate School for Computer Science, Saarland Informatics Campus E1.3, Saarbrücken, 66123, Germany.
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, Düsseldorf, 40225, Germany.
| |
Collapse
|
300
|
Full-length transcriptome sequencing combined with RNA-seq analysis revealed the immune response of fat greenling (Hexagrammos otakii) to Vibrio harveyi in early infection. Microb Pathog 2020; 149:104527. [PMID: 32980468 DOI: 10.1016/j.micpath.2020.104527] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/20/2020] [Accepted: 09/21/2020] [Indexed: 01/20/2023]
Abstract
Fat greenling (Hexagrammos otakii) is an important commercial marine fish species cultured in northeast Asia, but its available gene sequences are limited. Vibrio harveyi is a causative agent of vibriosis in fat greenling and also causes severe losses to the aquaculture industry in China. In order to obtain more high-quality transcript information and investigate the early immune response of fat greenling against V. harveyi, the fish were artificially infected with V. harveyi, and five sampling points were set within 48 h. Iso-Seq combined with RNA-Seq were applied in the comprehensive transcriptome analysis of V. harveyi-infected fat greenling. Total 42,225 consensus isoforms were successfully extracted from the result of Iso-Seq, and more than 19,000 ORFs were predicted. In addition, total three modules were identified by WGCNA which significantly positive correlated to the infection time, and the KEGG analysis showed that the immune-related genes in these modules mainly enriched in TLR signaling pathway, NF-κB signaling pathway and Endocytosis. The activation of inflammation and endocytosis was the most significant characteristics of fat greenling immune response during the early infection. Based on the WGCNA, a series of high-degree nodes in the networks were identified as hub genes. The protein structures of cold-inducible RNA-binding protein (CIRBP), poly [ADP-ribose] polymerase 1 (PARP1) and protein arginine N-methyl transferase 1 (PRMT1) were subsequently found to be highly conserved in vertebrate, and the gene expression pattern of CIRBP, PARP1, PRMT1 and a part of TLR/NF-κB pathway-related genes indicated that these proteins might have similar biological functions in regulation of inflammatory response in teleost fish. The results of this study provided the first systematical full-length transcriptome profile of fat greenling and characterized its immune responses in early infection of V. harvey, which will serve as the foundation for further exploring the molecular mechanism of immune defense against bacterial infection in fat greenling.
Collapse
|