1
|
Cai Z, Liu S, Wang W, Wang R, Miao X, Song P, Shan B, Wang L, Li Y, Lin L. Comparative transcriptome sequencing analysis of female and male Decapterus macrosoma. PeerJ 2022; 10:e14342. [PMID: 36389430 PMCID: PMC9651050 DOI: 10.7717/peerj.14342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
Sexual growth dimorphism is a common phenomenon in teleost fish and has led to many reproductive strategies. Growth- and sex-related gene research in teleost fish would broaden our understanding of the process. In this study, transcriptome sequencing of shortfin scad Decapterus macrosoma was performed for the first time, and a high-quality reference transcriptome was constructed. After identification and assembly, a total of 58,475 nonredundant unigenes were obtained with an N50 length of 2,266 bp, and 28,174 unigenes were successfully annotated with multiple public databases. BUSCO analysis determined a level of 92.9% completeness for the assembled transcriptome. Gene expression analysis revealed 2,345 differentially expressed genes (DEGs) in the female and male D. macrosoma, 1,150 of which were female-biased DEGs, and 1,195 unigenes were male-biased DEGs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were mainly involved in biological processes including protein synthesis, growth, rhythmic processes, immune defense, and vitellogenesis. Then, we identified many growth- and sex-related genes, including Igf, Fabps, EF-hand family genes, Zp3, Zp4 and Vg. In addition, a total of 19,573 simple sequence repeats (SSRs) were screened and identified from the transcriptome sequences. The results of this study can provide valuable information on growth- and sex-related genes and facilitate further exploration of the molecular mechanism of sexual growth dimorphism.
Collapse
Affiliation(s)
- Zizi Cai
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Shigang Liu
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Wei Wang
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Rui Wang
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Xing Miao
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Puqing Song
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Binbin Shan
- Key Laboratory of Marine Ranching, Ministry of Agriculture and Rural Affairs, Guangzhou, China
| | - Liangming Wang
- Key Laboratory of Marine Ranching, Ministry of Agriculture and Rural Affairs, Guangzhou, China
| | - Yuan Li
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China,Key Laboratory of Marine Ecological Conservation and Restoration, Ministry of Natural Resources, Xiamen, China
| | - Longshan Lin
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China,Key Laboratory of Marine Ecological Conservation and Restoration, Ministry of Natural Resources, Xiamen, China
| |
Collapse
|
2
|
Shan B, Liu Y, Yang C, Zhao Y, Sun D. Comparative transcriptomic analysis for identification of candidate sex-related genes and pathways in Crimson seabream (Parargyrops edita). Sci Rep 2021; 11:1077. [PMID: 33441831 PMCID: PMC7806868 DOI: 10.1038/s41598-020-80282-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 12/18/2020] [Indexed: 01/29/2023] Open
Abstract
Teleost fishes display the largest array of sex-determining systems among animals, resulting in various reproductive strategies. Research on sex-related genes in teleosts will broaden our understanding of the process, and provide important insight into the plasticity of the sex determination process in vertebrates in general. Crimson seabream (Parargyrops edita Tanaka, 1916) is one of the most valuable and abundant fish resources throughout Asia. However, little genomic information on P. edita is available. In the present study, the transcriptomes of male and female P. edita were sequenced with RNA-seq technology. A total of 388,683,472 reads were generated from the libraries. After filtering and assembling, a total of 79,775 non redundant unigenes were obtained with an N50 of 2,921 bp. The unigenes were annotated with multiple public databases, including NT (53,556, 67.13%), NR (54,092, 67.81%), Swiss-Prot (45,265, 56.74%), KOG (41,274, 51.74%), KEGG (46,302, 58.04%), and GO (11,056, 13.86%) databases. Comparison of the unigenes of different sexes of P. edita revealed that 11,676 unigenes (9,335 in females, 2,341 in males) were differentially expressed between males and females. Of these, 5,463 were specifically expressed in females, and 1,134 were specifically expressed in males. In addition, the expression levels of ten unigenes were confirmed to validate the transcriptomic data by qRT-PCR. Moreover, 34,473 simple sequence repeats (SSRs) were identified in SSR-containing sequences, and 50 loci were randomly selected for primer development. Of these, 36 loci were successfully amplified, and 19 loci were polymorphic. Finally, our comparative analysis identified many sex-related genes (zps, amh, gsdf, sox4, cyp19a, etc.) and pathways (MAPK signaling pathway, p53 signaling pathway, etc.) of P. edita. This informative transcriptomic analysis provides valuable data to increase genomic resources of P. edita. The results will be useful for clarifying the molecular mechanism of sex determination and for future functional analyses of sex-associated genes.
Collapse
Affiliation(s)
- Binbin Shan
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture Rural Affairs, Guangzhou, China
- Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou, China
- South China Sea Fisheries Research Institute, Chinese Academy of Fisheries Sciences, Guangzhou, China
| | - Yan Liu
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture Rural Affairs, Guangzhou, China
- Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou, China
- South China Sea Fisheries Research Institute, Chinese Academy of Fisheries Sciences, Guangzhou, China
| | - Changping Yang
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture Rural Affairs, Guangzhou, China
- Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou, China
- South China Sea Fisheries Research Institute, Chinese Academy of Fisheries Sciences, Guangzhou, China
| | - Yu Zhao
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture Rural Affairs, Guangzhou, China
- Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou, China
- South China Sea Fisheries Research Institute, Chinese Academy of Fisheries Sciences, Guangzhou, China
| | - Dianrong Sun
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture Rural Affairs, Guangzhou, China.
- Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, Guangzhou, China.
- South China Sea Fisheries Research Institute, Chinese Academy of Fisheries Sciences, Guangzhou, China.
| |
Collapse
|
3
|
Han C, Li Q, Chen Q, Zhou G, Huang J, Zhang Y. Transcriptome analysis of the spleen provides insight into the immunoregulation of Mastacembelus armatus under Aeromonas veronii infection. FISH & SHELLFISH IMMUNOLOGY 2019; 88:272-283. [PMID: 30772397 DOI: 10.1016/j.fsi.2019.02.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 02/07/2019] [Accepted: 02/13/2019] [Indexed: 06/09/2023]
Abstract
Mastacembelus armatus, also known as the zigzag eel, is an economically important species of freshwater fish that is very popular with consumers as a high-grade table fish in China. Recently, the wild population of this fish has declined gradually due to overfishing and various types of ecological imbalance. Meanwhile, the aquaculture of this spiny eel has flourished in southern China. To understand the immune response of zigzag eel to Aeromonas veronii, we carried out transcriptome sequencing of zigzag eel spleens after artificial bacterial infection. After assembly, 110,328 unigenes were obtained with 44.42% GC content. A total of 27,098 unigenes were successfully annotated by four public protein databases, namely, Nr, UniProt, KEGG and KOG. Differential expression analysis revealed the existence of 1278 significantly differentially expressed unigenes at 24 h post infection, with 767 unigenes upregulated and 511 unigenes downregulated. After GO and KEGG enrichment analyses, many immune-related GO categories and pathways were significantly enriched. The typical significantly enriched pathways included toll-like receptor signaling pathway, cytokine-cytokine receptor interaction and TNF signaling pathway. In addition, 40,027 microsatellites (SSRs) and 52,716 candidate single nucleotide polymorphisms (SNPs) were identified from the infection and control transcriptome libraries. Overall, this transcriptomic analysis provided valuable information for studying the immune response of zigzag eels against bacterial infection.
Collapse
Affiliation(s)
- Chong Han
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Provincial Key Laboratory for Aquatic Economic Animals, School of Life Sciences, Sun Yat-Sen University, Guangzhou, PR China
| | - Qiang Li
- School of Life Sciences, Guangzhou University, Guangzhou, PR China
| | - Qinghua Chen
- South China Institute of Environmental Science, MEP, Guangzhou, PR China
| | - Guofeng Zhou
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Provincial Key Laboratory for Aquatic Economic Animals, School of Life Sciences, Sun Yat-Sen University, Guangzhou, PR China
| | - Jianrong Huang
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Provincial Key Laboratory for Aquatic Economic Animals, School of Life Sciences, Sun Yat-Sen University, Guangzhou, PR China.
| | - Yong Zhang
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Provincial Key Laboratory for Aquatic Economic Animals, School of Life Sciences, Sun Yat-Sen University, Guangzhou, PR China.
| |
Collapse
|
4
|
Kerima OZ, Niranjana P, Vinay Kumar B, Ramachandrappa R, Puttappa S, Lalitha Y, Jalali SK, Ballal CR, Thulasiram HV. De novo transcriptome analysis of the egg parasitoid Trichogramma chilonis Ishii (Hymenoptera: Trichogrammatidae): A biological control agent. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
5
|
Sari E, Bhadauria V, Ramsay L, Borhan MH, Lichtenzveig J, Bett KE, Vandenberg A, Banniza S. Defense responses of lentil (Lens culinaris) genotypes carrying non-allelic ascochyta blight resistance genes to Ascochyta lentis infection. PLoS One 2018; 13:e0204124. [PMID: 30235263 PMCID: PMC6147436 DOI: 10.1371/journal.pone.0204124] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 09/03/2018] [Indexed: 12/24/2022] Open
Abstract
Ascochyta blight of lentil is an important fungal disease in many lentil-producing regions of the world causing major yield and grain quality losses. Quick shifts in aggressiveness of the population of the causal agent Ascochyta lentis mandates developing germplasm with novel and durable resistance. In the absence of complete resistance, lentil genotypes CDC Robin and 964a-46 have frequently been used as sources of partial resistance to ascochyta blight and carry non-allelic ascochyta blight resistance genes. RNA-seq analysis was conducted to identify differences in the transcriptome of CDC Robin, 964a-46 and the susceptible check Eston after inoculation with A. lentis. Candidate defense genes differentially expressed among the genotypes had hypothetical functions in various layers of plant defense, including pathogen recognition, phytohormone signaling pathways and downstream defense responses. CDC Robin and 964a-46 activated cell surface receptors (e.g. receptor like kinases) tentatively associated with pathogen-associated molecular patterns (PAMP) recognition and nucleotide-binding site leucine-rich repeat (NBS-LRR) receptors associated with intracellular effector recognition upon A. lentis infection, and differed in their activation of salicylic acid, abscisic acid and jasmonic acid / ethylene signal transduction pathways. These differences were reflected in the differential expression of downstream defense responses such as pathogenesis-related proteins, and genes associated with the induction of cell death and cell-wall reinforcement. A significant correlation between expression levels of a selection of genes based on quantitative real-time PCR and their expression levels estimated through RNA-seq demonstrated the technical and analytical accuracy of RNA-seq for identification of genes differentially expressed among genotypes. The presence of different resistance mechanisms in 964a-46 and CDC Robin indicates their value for pyramiding gene leading to more durable resistance to ascochyta blight.
Collapse
Affiliation(s)
- Ehsan Sari
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Vijai Bhadauria
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Larissa Ramsay
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - M. Hossein Borhan
- Agriculture and Agri-Food Canada, Saskatoon Research and Development Centre, Saskatoon, Saskatchewan, Canada
| | - Judith Lichtenzveig
- School of Agriculture and Environment, University of Western Australia, Perth, Western Australia, Australia
| | - Kirstin E. Bett
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Albert Vandenberg
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Sabine Banniza
- Department of Plant Sciences/Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
- * E-mail:
| |
Collapse
|
6
|
Armero A, Baudouin L, Bocs S, This D. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut. PLoS One 2017; 12:e0173300. [PMID: 28334050 PMCID: PMC5363918 DOI: 10.1371/journal.pone.0173300] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 02/17/2017] [Indexed: 01/20/2023] Open
Abstract
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
Collapse
Affiliation(s)
- Alix Armero
- Montpellier SupAgro, UMR AGAP, Montpellier, France
| | | | - Stéphanie Bocs
- CIRAD, UMR AGAP, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
| | | |
Collapse
|
7
|
Characterization of the global transcriptome and microsatellite marker information for spotted halibut Verasper variegatus. Genes Genomics 2016. [DOI: 10.1007/s13258-016-0496-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
8
|
Jia BY, Ba HX, Wang GW, Yang Y, Cui XZ, Peng YH, Zheng JJ, Xing XM, Yang FH. Transcriptome analysis of sika deer in China. Mol Genet Genomics 2016; 291:1941-53. [PMID: 27423230 DOI: 10.1007/s00438-016-1231-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Accepted: 07/11/2016] [Indexed: 12/17/2022]
Abstract
Sika deer is of great commercial value because their antlers are used in tonics and alternative medicine and their meat is healthy and delicious. The goal of this study was to generate transcript sequences from sika deer for functional genomic analyses and to identify the transcripts that demonstrate tissue-specific, age-dependent differential expression patterns. These sequences could enhance our understanding of the molecular mechanisms underlying sika deer growth and development. In the present study, we performed de novo transcriptome assembly and profiling analysis across ten tissue types and four developmental stages (juvenile, adolescent, adult, and aged) of sika deer, using Illumina paired-end tag (PET) sequencing technology. A total of 1,752,253 contigs with an average length of 799 bp were generated, from which 1,348,618 unigenes with an average length of 590 bp were defined. Approximately 33.2 % of these (447,931 unigenes) were then annotated in public protein databases. Many sika deer tissue-specific, age-dependent unigenes were identified. The testes have the largest number of tissue-enriched unigenes, and some of them were prone to develop new functions for other tissues. Additionally, our transcriptome revealed that the juvenile-adolescent transition was the most complex and important stage of the sika deer life cycle. The present work represents the first multiple tissue transcriptome analysis of sika deer across four developmental stages. The generated data not only provide a functional genomics resource for future biological research on sika deer but also guide the selection and manipulation of genes controlling growth and development.
Collapse
Affiliation(s)
- Bo-Yin Jia
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Heng-Xing Ba
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Gui-Wu Wang
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Ying Yang
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Xue-Zhe Cui
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Ying-Hua Peng
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Jun-Jun Zheng
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Xiu-Mei Xing
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China
| | - Fu-He Yang
- State Key Laboratory for Molecular Biology of Special Economical Animals, Institute of Special Economic Animals and Plants, Chinese Academy of Agricultural Sciences, 4899 Juye Street, Changchun, 130112, China.
| |
Collapse
|
9
|
Ma D, Ma A, Huang Z, Wang G, Wang T, Xia D, Ma B. Transcriptome Analysis for Identification of Genes Related to Gonad Differentiation, Growth, Immune Response and Marker Discovery in The Turbot (Scophthalmus maximus). PLoS One 2016; 11:e0149414. [PMID: 26925843 PMCID: PMC4771204 DOI: 10.1371/journal.pone.0149414] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 02/01/2016] [Indexed: 11/18/2022] Open
Abstract
Background Turbot Scophthalmus maximus is an economically important species extensively aquacultured in China. The genetic selection program is necessary and urgent for the sustainable development of this industry, requiring more and more genome background knowledge. Transcriptome sequencing is an excellent alternative way to identify transcripts involved in specific biological processes and exploit a considerable quantity of molecular makers when no genome sequences are available. In this study, a comprehensive transcript dataset for major tissues of S. maximus was produced on basis of an Illumina platform. Results Total RNA was isolated from liver, spleen, kidney, cerebrum, gonad (testis and ovary) and muscle. Equal quantities of RNA from each type of tissues were pooled to construct two cDNA libraries (male and female). Using the Illumina paired-end sequencing technology, nearly 44.22 million clean reads in length of 100 bp were generated and then assembled into 106,643 contigs, of which 71,107 were named unigenes with an average length of 892 bp after the elimination of redundancies. Of these, 24,052 unigenes (33.83% of the total) were successfully annotated. GO, KEGG pathway mapping and COG analysis were performed to predict potential genes and their functions. Based on our sequence analysis and published documents, many candidate genes with fundamental roles in sex determination and gonad differentiation (dmrt1), growth (ghrh, myf5, prl/prlr) and immune response (TLR1/TLR21/TLR22, IL-15/IL-34), were identified for the first time in this species. In addition, a large number of credible genetic markers, including 21,192 SSRs and 8,642 SNPs, were identified in the present dataset. Conclusion This informative transcriptome provides valuable new data to increase genomic resources of Scophthalmus maximus. The future studies of corresponding gene functions will be very useful for the management of reproduction, growth and disease control in turbot aquaculture breeding programs. The molecular markers identified in this database will aid in genetic linkage analyses, mapping of quantitative trait loci, and acceleration of marker assisted selection programs.
Collapse
Affiliation(s)
- Deyou Ma
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
- Dalian Ocean University, Dalian, 116023, China
| | - Aijun Ma
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
- * E-mail:
| | - Zhihui Huang
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
| | - Guangning Wang
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
| | - Ting Wang
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
| | - Dandan Xia
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
| | - Benhe Ma
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao Key Laboratory for Marine Fish Breeding and Biotechnology, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China
| |
Collapse
|
10
|
Wang K, del Castillo C, Corre E, Pales Espinosa E, Allam B. Clam focal and systemic immune responses to QPX infection revealed by RNA-seq technology. BMC Genomics 2016; 17:146. [PMID: 26921237 PMCID: PMC4769524 DOI: 10.1186/s12864-016-2493-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 02/17/2016] [Indexed: 12/31/2022] Open
Abstract
Background The hard clam Mercenaria mercenaria is an important seafood species widely exploited along the eastern coasts of the United States and play a crucial role in coastal ecology and economy. Severe hard clam mortalities have been associated with the protistan parasite QPX (Quahog Parasite Unknown). QPX infection establishes in pallial organs with the lesions typically characterized as nodules, which represent inflammatory masses formed by hemocyte infiltration and encapsulation of parasites. QPX infection is known to induce host changes on both the whole-organism level and at specific lesion areas, which imply systemic and focal defense responses, respectively. However, little is known about the molecular mechanisms underlying these alterations. Results RNA-seq was performed using Illumina Hiseq 2000 (641 Million 100 bp reads) to characterize M. mercenaria focal and systemic immune responses to QPX. Transcripts were assembled and the expression levels were compared between nodule and healthy tissues from infected clams, and between these and tissues from healthy clams. De novo assembly reconstructed a consensus transcriptome of 62,980 sequences that was functionally-annotated. A total of 3,131 transcripts were identified as differentially expressed in different tissues. Results allowed the identification of host immune factors implicated in the systemic and focal responses against QPX and unraveled the pathways involved in parasite neutralization. Among transcripts significantly modulated upon host-pathogen interactions, those involved in non-self recognition, signal transduction and defense response were over-represented. Alterations in pathways regulating hemocyte focal adhesion, migration and apoptosis were also demonstrated. Conclusions Our study is the first attempt to thoroughly characterize M. mercenaria transcriptome and identify molecular features associated with QPX infection. It is also one of the first studies contrasting focal and systemic responses to infections in invertebrates using high-throughput sequencing. Results identified the molecular signatures of clam systemic and focal defense responses, to collectively mediate immune processes such as hemocyte recruitment and local inflammation. These investigations improve our understanding of bivalve immunity and provide molecular targets for probing the biological bases of clam resistance towards QPX. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2493-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kailai Wang
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, NY, 11794-5000, USA.
| | - Carmelo del Castillo
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, NY, 11794-5000, USA.
| | - Erwan Corre
- Analyses and Bioinformatics for Marine Science, Station Biologique de Roscoff, 29688, Roscoff Cedex, France.
| | - Emmanuelle Pales Espinosa
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, NY, 11794-5000, USA.
| | - Bassem Allam
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, NY, 11794-5000, USA.
| |
Collapse
|
11
|
Honaas LA, Wafula EK, Wickett NJ, Der JP, Zhang Y, Edger PP, Altman NS, Pires JC, Leebens-Mack JH, dePamphilis CW. Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome. PLoS One 2016; 11:e0146062. [PMID: 26731733 PMCID: PMC4701411 DOI: 10.1371/journal.pone.0146062] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 12/11/2015] [Indexed: 12/29/2022] Open
Abstract
Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1) proportion of reads mapping to an assembly 2) recovery of conserved, widely expressed genes, 3) N50 length statistics, and 4) the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.
Collapse
Affiliation(s)
- Loren A Honaas
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - Eric K Wafula
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - Norman J Wickett
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - Joshua P Der
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - Yeting Zhang
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - Patrick P Edger
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, 65211, United States of America
| | - Naomi S Altman
- Department of Statistics, Penn State, University Park, Pennsylvania, 16802, United States of America
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, 65211, United States of America
| | - James H Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia, 30602, United States of America
| | - Claude W dePamphilis
- Biology Department, Penn State, University Park, Pennsylvania, 16802, United States of America
| |
Collapse
|
12
|
Pan B, Ren Y, Gao J, Gao H. De novo RNA-Seq analysis of the venus clam, Cyclina sinensis, and the identification of immune-related genes. PLoS One 2015; 10:e0123296. [PMID: 25853714 PMCID: PMC4390376 DOI: 10.1371/journal.pone.0123296] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 02/17/2015] [Indexed: 11/24/2022] Open
Abstract
The Venus clam, Cyclina sinensis, is one of the most important bivalves in China. In recent years, increasing expansive morbidity has occurred in breeding areas, imposing significant losses on the national economy. To understand the molecular mechanisms of immune-related genes, we analyzed and sequenced hemolymph samples that were injected with two pathogenic microorganisms using the Illumina Miseq system. After trimming, more than 12 M PE reads with an average length greater than 410 bp were assembled into 70,079 transcripts with a mean length of 980 bp. Using a homology analysis, 102 (135 transcripts) potentially immune-related genes were identified, and most of them exhibited a similar pattern in both samples. These data indicated that the response of the clam to both types of bacterial infection might follow a similar molecular mechanism. Using the TreeFam method, 9,904 gene families and 1,031 unique families of the clam were preliminarily classified in comparison to five related species. A significant number of SSRs were identified, which could facilitate the identification of polymorphisms in Venus clam populations. These datasets will improve our knowledge of the molecular mechanisms driving the immune response to bacterial infection in clam populations and will provide basic data about clam breeding and disease control.
Collapse
Affiliation(s)
- Baoping Pan
- College of Life Sciences, Tianjin Key Laboratory of Animal and Plant Resistance, Tianjin Normal University, Tianjin, P. R. China 300387
| | - Yipeng Ren
- College of Life Sciences, Tianjin Key Laboratory of Animal and Plant Resistance, Tianjin Normal University, Tianjin, P. R. China 300387
| | - Jing Gao
- College of Life Sciences, Tianjin Key Laboratory of Animal and Plant Resistance, Tianjin Normal University, Tianjin, P. R. China 300387
| | - Hong Gao
- College of Life Sciences, Tianjin Key Laboratory of Animal and Plant Resistance, Tianjin Normal University, Tianjin, P. R. China 300387
| |
Collapse
|
13
|
Bevilacqua V, Pietroleonardo N, Giannino E, Stroppa F, Simone D, Pesole G, Picardi E. EasyCluster2: an improved tool for clustering and assembling long transcriptome reads. BMC Bioinformatics 2014; 15 Suppl 15:S7. [PMID: 25474441 PMCID: PMC4271567 DOI: 10.1186/1471-2105-15-s15-s7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms. RESULTS EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets. CONCLUSIONS EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics.
Collapse
|
14
|
Nguyen Thanh H, Zhao L, Liu Q. De novo transcriptome sequencing analysis and comparison of differentially expressed genes (DEGs) in Macrobrachium rosenbergii in China. PLoS One 2014; 9:e109656. [PMID: 25329319 PMCID: PMC4203760 DOI: 10.1371/journal.pone.0109656] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 08/22/2014] [Indexed: 11/30/2022] Open
Abstract
Giant freshwater prawn (GFP; Macrobrachium rosenbergii) is an exotic species that was introduced into China in 1976 and thereafter it became a major species in freshwater aquaculture. However the gene discovery in this species has been limited to small-scale data collection in China. We used the next generation sequencing technology for the experiment; the transcriptome was sequenced of samples of hepatopancreas organ in individuals from 4 GFP groups (A1, A2, B1 and B2). De novo transcriptome sequencing generated 66,953 isogenes. Using BLASTX to search the Non-redundant (NR), Search Tool for the Retrieval of Interacting Genes (STRING), and Kyoto Encyclopedia of Genes and Genome (KEGG) databases; 21,224 unigenes were annotated, 9,552 matched unigenes with the Gene Ontology (GO) classification; 5,782 matched unigenes in 25 categories of Clusters of Orthologous Groups of proteins (COG) and 20,859 unigenes were consequently assigned to 312 KEGG pathways. Between the A and B groups 147 differentially expressed genes (DEGs) were identified; between the A1 and A2 groups 6,860 DEGs were identified and between the B1 and B2 groups 5,229 DEGs were identified. After enrichment, the A and B groups identified 38 DEGs, but none of them were significantly enriched. The A1 and A2 groups identified 21,856 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function and the KEGG pathway defined 2,459 genes had a KEGG Ortholog-ID (KO-ID) and could be categorized into 251 pathways, of those, 9 pathways were significantly enriched. The B1 and B2 groups identified 5,940 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function, and the KEGG pathway defined 1,543 genes had a KO-ID and could be categorized into 240 pathways, of those, 2 pathways were significantly enriched. We investigated 99 queries (GO) which related to growth of GFP in 4 groups. After enrichment we identified 23 DEGs and 1 KEGG PATHWAY 'ko04711' relation with GFP growth.
Collapse
Affiliation(s)
- Hai Nguyen Thanh
- Key Laboratory of Freshwater Fishery Germplasm Resources, Shanghai Ocean University, Ministry of Agriculture, Shanghai City, P. R. China
- Vietnam Institute of Fisheries Economics and Planning, Directorate of Fisheries, Ministry of Agriculture and Rural Development of Viet Nam, Hanoi City, S.R. Vietnam
| | - Liangjie Zhao
- Key Laboratory of Freshwater Fishery Germplasm Resources, Shanghai Ocean University, Ministry of Agriculture, Shanghai City, P. R. China
| | - Qigen Liu
- Key Laboratory of Freshwater Fishery Germplasm Resources, Shanghai Ocean University, Ministry of Agriculture, Shanghai City, P. R. China
| |
Collapse
|
15
|
Thanh NM, Jung H, Lyons RE, Chand V, Tuan NV, Thu VTM, Mather P. A transcriptomic analysis of striped catfish (Pangasianodon hypophthalmus) in response to salinity adaptation: De novo assembly, gene annotation and marker discovery. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2014; 10:52-63. [PMID: 24841517 DOI: 10.1016/j.cbd.2014.04.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2013] [Revised: 04/16/2014] [Accepted: 04/28/2014] [Indexed: 01/25/2023]
Abstract
The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9ppt which showed best growth performance. Total sequence data generated was 467.8Mbp, consisting of 4,116,424 reads with an average length of 112bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species.
Collapse
Affiliation(s)
- Nguyen Minh Thanh
- International University, VNU HCMC, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Viet Nam.
| | - Hyungtaek Jung
- Institute for Future Environment, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia; Science and Engineering Faculty, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Russell E Lyons
- CSIRO Livestock Industries, Queensland Biosciences Precinct, QLD 4057, Australia.
| | - Vincent Chand
- Science and Engineering Faculty, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Nguyen Viet Tuan
- Science and Engineering Faculty, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Vo Thi Minh Thu
- International University, VNU HCMC, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Viet Nam.
| | - Peter Mather
- Science and Engineering Faculty, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| |
Collapse
|
16
|
Transcriptome analysis of the Portunus trituberculatus: de novo assembly, growth-related gene identification and marker discovery. PLoS One 2014; 9:e94055. [PMID: 24722690 PMCID: PMC3983128 DOI: 10.1371/journal.pone.0094055] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 03/11/2014] [Indexed: 11/19/2022] Open
Abstract
Background The swimming crab, Portunus trituberculatus, is an important farmed species in China, has been attracting extensive studies, which require more and more genome background knowledge. To date, the sequencing of its whole genome is unavailable and transcriptomic information is also scarce for this species. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for major tissues of Portunus trituberculatus by the Illumina paired-end sequencing technology. Results Total RNA was isolated from eyestalk, gill, heart, hepatopancreas and muscle. Equal quantities of RNA from each tissue were pooled to construct a cDNA library. Using the Illumina paired-end sequencing technology, we generated a total of 120,137 transcripts with an average length of 1037 bp. Further assembly analysis showed that all contigs contributed to 87,100 unigenes, of these, 16,029 unigenes (18.40% of the total) can be matched in the GenBank non-redundant database. Potential genes and their functions were predicted by GO, KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literature, many putative genes with fundamental roles in growth and muscle development, including actin, myosin, tropomyosin, troponin and other potentially important candidate genes were identified for the first time in this specie. Furthermore, 22,673 SSRs and 66,191 high-confidence SNPs were identified in this EST dataset. Conclusion The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in Portunus trituberculatus. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will be essential for accelerating aquaculture breeding programs with this species.
Collapse
|
17
|
Zhang Y, Zheng Y, Li D, Fan Y. Transcriptomics and identification of the chemoreceptor superfamily of the pupal parasitoid of the oriental fruit fly, Spalangia endius Walker (Hymenoptera: Pteromalidae). PLoS One 2014; 9:e87800. [PMID: 24505315 PMCID: PMC3914838 DOI: 10.1371/journal.pone.0087800] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 12/30/2013] [Indexed: 12/16/2022] Open
Abstract
Background The oriental fruit fly, Bactrocera dorsalis Hendel, causes serious losses to fruit production and is one of the most economically important pests in many countries, including China, Spalangia endius Walker is a pupal parasitoid of various dipteran hosts, and may be considered a potentially important ectoparasitic pupal parasitoid of B. dorsalis. However, lack of genetic information on this organism is an obstacle to understanding the mechanisms behind its interaction with this host. Analysis of the S. endius transcriptome is essential to extend the resources of genetic information on this species and, to support studies on S. endius on the host B. dorsalis. Methodology/Principal Findings We performed de novo assembly RNA-seq of S. endius. We obtained nearly 10 Gbp of data using a HiSeq platform, and 36319 high-quality transcripts using Trinity software. A total of 22443 (61.79%) unigenes were aligned to homologous sequences in the jewel wasp and honeybee (Apis florae) protein set from public databases. A total of 10037 protein domains were identified in 7892 S. endius transcripts using HMMER3 software. We identified expression of six gustatory receptor and 21 odorant receptor genes in the sample, with only one gene having a high expression level in each family. The other genes had a low expression level, including two genes regulated by splicing. This result may be due to the wasps being kept under laboratory conditions. Additionally, a total of 3727 SSR markers were predicted, which could facilitate the identification of polymorphisms and functional genes within wasp populations. Conclusion/Significance This transcriptome greatly improves our genetic understanding of S. endius and provides a large number of gene sequences for further study.
Collapse
Affiliation(s)
- Yuping Zhang
- Plant Protection Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Guangzhou, Guangdong, China
- Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yuan Zheng
- Plant Protection Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Guangzhou, Guangdong, China
- Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dunsong Li
- Plant Protection Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Guangzhou, Guangdong, China
- Guangdong Academy of Agricultural Sciences, Guangzhou, China
- * E-mail:
| | - Yilin Fan
- Plant Protection Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Guangzhou, Guangdong, China
- Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
18
|
Altman N, Leebens-Mack J, Zahn L, Chanderbali A, Tian D, Werner L, Ma H, dePamphilis C. Behind the Scenes: Planning a Multispecies Microarray Experiment. ACTA ACUST UNITED AC 2013. [DOI: 10.1080/09332480.2006.10722799] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
19
|
Sedeek KEM, Qi W, Schauer MA, Gupta AK, Poveda L, Xu S, Liu ZJ, Grossniklaus U, Schiestl FP, Schlüter PM. Transcriptome and proteome data reveal candidate genes for pollinator attraction in sexually deceptive orchids. PLoS One 2013; 8:e64621. [PMID: 23734209 PMCID: PMC3667177 DOI: 10.1371/journal.pone.0064621] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 04/17/2013] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Sexually deceptive orchids of the genus Ophrys mimic the mating signals of their pollinator females to attract males as pollinators. This mode of pollination is highly specific and leads to strong reproductive isolation between species. This study aims to identify candidate genes responsible for pollinator attraction and reproductive isolation between three closely related species, O. exaltata, O. sphegodes and O. garganica. Floral traits such as odour, colour and morphology are necessary for successful pollinator attraction. In particular, different odour hydrocarbon profiles have been linked to differences in specific pollinator attraction among these species. Therefore, the identification of genes involved in these traits is important for understanding the molecular basis of pollinator attraction by sexually deceptive orchids. RESULTS We have created floral reference transcriptomes and proteomes for these three Ophrys species using a combination of next-generation sequencing (454 and Solexa), Sanger sequencing, and shotgun proteomics (tandem mass spectrometry). In total, 121 917 unique transcripts and 3531 proteins were identified. This represents the first orchid proteome and transcriptome from the orchid subfamily Orchidoideae. Proteome data revealed proteins corresponding to 2644 transcripts and 887 proteins not observed in the transcriptome. Candidate genes for hydrocarbon and anthocyanin biosynthesis were represented by 156 and 61 unique transcripts in 20 and 7 genes classes, respectively. Moreover, transcription factors putatively involved in the regulation of flower odour, colour and morphology were annotated, including Myb, MADS and TCP factors. CONCLUSION Our comprehensive data set generated by combining transcriptome and proteome technologies allowed identification of candidate genes for pollinator attraction and reproductive isolation among sexually deceptive orchids. This includes genes for hydrocarbon and anthocyanin biosynthesis and regulation, and the development of floral morphology. These data will serve as an invaluable resource for research in orchid floral biology, enabling studies into the molecular mechanisms of pollinator attraction and speciation.
Collapse
Affiliation(s)
- Khalid E M Sedeek
- Institute of Systematic Botany & Zürich-Basel Plant Science Centre, University of Zurich, Zürich, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Analysis of genome survey sequences and SSR marker development for Siamese Mud Carp, Henicorhynchus siamensis, using 454 pyrosequencing. Int J Mol Sci 2012; 13:10807-10827. [PMID: 23109823 PMCID: PMC3472715 DOI: 10.3390/ijms130910807] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Revised: 07/30/2012] [Accepted: 08/24/2012] [Indexed: 11/17/2022] Open
Abstract
Siamese mud carp (Henichorynchus siamensis) is a freshwater teleost of high economic importance in the Mekong River Basin. However, genetic data relevant for delineating wild stocks for management purposes currently are limited for this species. Here, we used 454 pyrosequencing to generate a partial genome survey sequence (GSS) dataset to develop simple sequence repeat (SSR) markers from H. siamensis genomic DNA. Data generated included a total of 65,954 sequence reads with average length of 264 nucleotides, of which 2.79% contain SSR motifs. Based on GSS-BLASTx results, 10.5% of contigs and 8.1% singletons possessed significant similarity (E value < 10(-5)) with the majority matching well to reported fish sequences. KEGG analysis identified several metabolic pathways that provide insights into specific potential roles and functions of sequences involved in molecular processes in H. siamensis. Top protein domains detected included reverse transcriptase and the top putative functional transcript identified was an ORF2-encoded protein. One thousand eight hundred and thirty seven sequences containing SSR motifs were identified, of which 422 qualified for primer design and eight polymorphic loci have been tested with average observed and expected heterozygosity estimated at 0.75 and 0.83, respectively. Regardless of their relative levels of polymorphism and heterozygosity, microsatellite loci developed here are suitable for further population genetic studies in H. siamensis and may also be applicable to other related taxa.
Collapse
|
21
|
Abstract
Applications of clustering algorithms in biomedical research are ubiquitous, with typical examples including gene expression data analysis, genomic sequence analysis, biomedical document mining, and MRI image analysis. However, due to the diversity of cluster analysis, the differing terminologies, goals, and assumptions underlying different clustering algorithms can be daunting. Thus, determining the right match between clustering algorithms and biomedical applications has become particularly important. This paper is presented to provide biomedical researchers with an overview of the status quo of clustering algorithms, to illustrate examples of biomedical applications based on cluster analysis, and to help biomedical researchers select the most suitable clustering algorithms for their own applications.
Collapse
Affiliation(s)
- Rui Xu
- Industrial Artificial Intelligence Laboratory, GE Global Research Center, Niskayuna, NY 12309, USA.
| | | |
Collapse
|
22
|
Pereiro P, Balseiro P, Romero A, Dios S, Forn-Cuni G, Fuste B, Planas JV, Beltran S, Novoa B, Figueras A. High-throughput sequence analysis of turbot (Scophthalmus maximus) transcriptome using 454-pyrosequencing for the discovery of antiviral immune genes. PLoS One 2012; 7:e35369. [PMID: 22629298 PMCID: PMC3356354 DOI: 10.1371/journal.pone.0035369] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 03/16/2012] [Indexed: 02/01/2023] Open
Abstract
Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms.
Collapse
Affiliation(s)
| | - Pablo Balseiro
- Instituto de Investigaciones Marinas, IIM, CSIC, Vigo, Spain
| | | | - Sonia Dios
- Instituto de Investigaciones Marinas, IIM, CSIC, Vigo, Spain
| | | | - Berta Fuste
- Centros Científicos y Tecnológicos de la UB, CCiT-UB, Universitat de Barcelona, Edifici Clúster, Parc Científic de Barcelona, Barcelona, Spain
| | - Josep V. Planas
- Departament de Fisiologia i Immunologia, Facultat de Biologia, Universitat de Barcelona i Institut de Biomedicina de la Universitat de Barcelona, IBUB, Barcelona, Spain
| | - Sergi Beltran
- Centros Científicos y Tecnológicos de la UB, CCiT-UB, Universitat de Barcelona, Edifici Clúster, Parc Científic de Barcelona, Barcelona, Spain
| | - Beatriz Novoa
- Instituto de Investigaciones Marinas, IIM, CSIC, Vigo, Spain
| | - Antonio Figueras
- Instituto de Investigaciones Marinas, IIM, CSIC, Vigo, Spain
- * E-mail:
| |
Collapse
|
23
|
Milnthorpe AT, Soloviev M. The use of EST expression matrixes for the quality control of gene expression data. PLoS One 2012; 7:e32966. [PMID: 22412959 PMCID: PMC3297614 DOI: 10.1371/journal.pone.0032966] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 02/06/2012] [Indexed: 01/10/2023] Open
Abstract
EST expression profiling provides an attractive tool for studying differential gene expression, but cDNA libraries' origins and EST data quality are not always known or reported. Libraries may originate from pooled or mixed tissues; EST clustering, EST counts, library annotations and analysis algorithms may contain errors. Traditional data analysis methods, including research into tissue-specific gene expression, assume EST counts to be correct and libraries to be correctly annotated, which is not always the case. Therefore, a method capable of assessing the quality of expression data based on that data alone would be invaluable for assessing the quality of EST data and determining their suitability for mRNA expression analysis. Here we report an approach to the selection of a small generic subset of 244 UniGene clusters suitable for identification of the tissue of origin for EST libraries and quality control of the expression data using EST expression information alone. We created a small expression matrix of UniGene IDs using two rounds of selection followed by two rounds of optimisation. Our selection procedures differ from traditional approaches to finding "tissue-specific" genes and our matrix yields consistency high positive correlation values for libraries with confirmed tissues of origin and can be applied for tissue typing and quality control of libraries as small as just a few hundred total ESTs. Furthermore, we can pick up tissue correlations between related tissues e.g. brain and peripheral nervous tissue, heart and muscle tissues and identify tissue origins for a few libraries of uncharacterised tissue identity. It was possible to confirm tissue identity for some libraries which have been derived from cancer tissues or have been normalised. Tissue matching is affected strongly by cancer progression or library normalisation and our approach may potentially be applied for elucidating the stage of normalisation in normalised libraries or for cancer staging.
Collapse
Affiliation(s)
- Andrew T. Milnthorpe
- School of Biological Sciences, CBMS, Royal Holloway University of London, Egham, Surrey, United Kingdom
| | - Mikhail Soloviev
- School of Biological Sciences, CBMS, Royal Holloway University of London, Egham, Surrey, United Kingdom
| |
Collapse
|
24
|
Abstract
Allelic variation within species provides fundamental insights into the evolution and ecology of organisms, and information about this variation is becoming increasingly available in sequence datasets of multiple and/or outbred individuals. Unfortunately, identifying true allelic variants poses a number of challenges, given the presence of both sequencing errors and alleles from other closely related loci. We outline the key considerations involved in this process, including assessing the accuracy of allele resolution in sequence assembly, clustering of alleles within and among individuals, and identifying clusters that are most likely to correspond to true allelic variants of a single locus. Our focus is particularly on the case where alleles must be identified without a fully resolved reference genome, and where sequence depth information cannot be used to infer the putative number of loci sharing a sequence, such as in transcriptome or post-assembly datasets. Throughout, we provide information about publicly available tools to aid allele identification in such cases.
Collapse
Affiliation(s)
- Katrina M Dlugosch
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA.
| | | |
Collapse
|
25
|
Juhász A, Makai S, Sebestyén E, Tamás L, Balázs E. Role of conserved non-coding regulatory elements in LMW glutenin gene expression. PLoS One 2011; 6:e29501. [PMID: 22242127 PMCID: PMC3248431 DOI: 10.1371/journal.pone.0029501] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 11/29/2011] [Indexed: 02/02/2023] Open
Abstract
Transcriptional regulation of LMW glutenin genes were investigated in-silico, using publicly available gene sequences and expression data. Genes were grouped into different LMW glutenin types and their promoter profiles were determined using cis-acting regulatory elements databases and published results. The various cis-acting elements belong to some conserved non-coding regulatory regions (CREs) and might act in two different ways. There are elements, such as GCN4 motifs found in the long endosperm box that could serve as key factors in tissue-specific expression. Some other elements, such as the AACA/TA motifs or the individual prolamin box variants, might modulate the level of expression. Based on the promoter sequences and expression characteristic LMW glutenin genes might be transcribed following two different mechanisms. Most of the s- and i-type genes show a continuously increasing expression pattern. The m-type genes, however, demonstrate normal distribution in their expression profiles. Differences observed in their expression could be related to the differences found in their promoter sequences. Polymorphisms in the number and combination of cis-acting elements in their promoter regions can be of crucial importance in the diverse levels of production of single LMW glutenin gene types.
Collapse
Affiliation(s)
- Angéla Juhász
- Applied Genomics Department, Agricultural Research Institute of the Hungarian Academy of Sciences, Martonvásár, Hungary.
| | | | | | | | | |
Collapse
|
26
|
Jung H, Lyons RE, Dinh H, Hurwood DA, McWilliam S, Mather PB. Transcriptomics of a giant freshwater prawn (Macrobrachium rosenbergii): de novo assembly, annotation and marker discovery. PLoS One 2011; 6:e27938. [PMID: 22174756 PMCID: PMC3234237 DOI: 10.1371/journal.pone.0027938] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 10/28/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Giant freshwater prawn (Macrobrachium rosenbergii or GFP), is the most economically important freshwater crustacean species. However, as little is known about its genome, 454 pyrosequencing of cDNA was undertaken to characterise its transcriptome and identify genes important for growth. METHODOLOGY AND PRINCIPAL FINDINGS A collection of 787,731 sequence reads (244.37 Mb) obtained from 454 pyrosequencing analysis of cDNA prepared from muscle, ovary and testis tissues taken from 18 adult prawns was assembled into 123,534 expressed sequence tags (ESTs). Of these, 46% of the 8,411 contigs and 19% of 115,123 singletons possessed high similarity to sequences in the GenBank non-redundant database, with most significant (E value < 1e(-5)) contig (80%) and singleton (84%) matches occurring with crustacean and insect sequences. KEGG analysis of the contig open reading frames identified putative members of several biological pathways potentially important for growth. The top InterProScan domains detected included RNA recognition motifs, serine/threonine-protein kinase-like domains, actin-like families, and zinc finger domains. Transcripts derived from genes such as actin, myosin heavy and light chain, tropomyosin and troponin with fundamental roles in muscle development and construction were abundant. Amongst the contigs, 834 single nucleotide polymorphisms, 1198 indels and 658 simple sequence repeats motifs were also identified. CONCLUSIONS The M. rosenbergii transcriptome data reported here should provide an invaluable resource for improving our understanding of this species' genome structure and biology. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Biogeosciences, Queensland University of Technology, Brisbane, Queensland, Australia.
| | | | | | | | | | | |
Collapse
|
27
|
Bai X, Mamidala P, Rajarapu SP, Jones SC, Mittapalli O. Transcriptomics of the bed bug (Cimex lectularius). PLoS One 2011; 6:e16336. [PMID: 21283830 PMCID: PMC3023805 DOI: 10.1371/journal.pone.0016336] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2010] [Accepted: 12/10/2010] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Bed bugs (Cimex lectularius) are blood-feeding insects poised to become one of the major pests in households throughout the United States. Resistance of C. lectularius to insecticides/pesticides is one factor thought to be involved in its sudden resurgence. Despite its high-impact status, scant knowledge exists at the genomic level for C. lectularius. Hence, we subjected the C. lectularius transcriptome to 454 pyrosequencing in order to identify potential genes involved in pesticide resistance. METHODOLOGY AND PRINCIPAL FINDINGS Using 454 pyrosequencing, we obtained a total of 216,419 reads with 79,596,412 bp, which were assembled into 35,646 expressed sequence tags (3902 contigs and 31744 singletons). Nearly 85.9% of the C. lectularius sequences showed similarity to insect sequences, but 44.8% of the deduced proteins of C. lectularius did not show similarity with sequences in the GenBank non-redundant database. KEGG analysis revealed putative members of several detoxification pathways involved in pesticide resistance. Lamprin domains, Protein Kinase domains, Protein Tyrosine Kinase domains and cytochrome P450 domains were among the top Pfam domains predicted for the C. lectularius sequences. An initial assessment of putative defense genes, including a cytochrome P450 and a glutathione-S-transferase (GST), revealed high transcript levels for the cytochrome P450 (CYP9) in pesticide-exposed versus pesticide-susceptible C. lectularius populations. A significant number of single nucleotide polymorphisms (296) and microsatellite loci (370) were predicted in the C. lectularius sequences. Furthermore, 59 putative sequences of Wolbachia were retrieved from the database. CONCLUSIONS To our knowledge this is the first study to elucidate the genetic makeup of C. lectularius. This pyrosequencing effort provides clues to the identification of potential detoxification genes involved in pesticide resistance of C. lectularius and lays the foundation for future functional genomics studies.
Collapse
Affiliation(s)
- Xiaodong Bai
- Department of Entomology, Ohio Agricultural and Research Development Center, The Ohio State University, Wooster, Ohio, United States of America
| | - Praveen Mamidala
- Department of Entomology, Ohio Agricultural and Research Development Center, The Ohio State University, Wooster, Ohio, United States of America
| | - Swapna P. Rajarapu
- Department of Entomology, Ohio Agricultural and Research Development Center, The Ohio State University, Wooster, Ohio, United States of America
| | - Susan C. Jones
- Department of Entomology, The Ohio State University, Columbus, Ohio, United States of America
| | - Omprakash Mittapalli
- Department of Entomology, Ohio Agricultural and Research Development Center, The Ohio State University, Wooster, Ohio, United States of America
| |
Collapse
|
28
|
Vidal RO, Mondego JMC, Pot D, Ambrósio AB, Andrade AC, Pereira LFP, Colombo CA, Vieira LGE, Carazzolle MF, Pereira GAG. A high-throughput data mining of single nucleotide polymorphisms in Coffea species expressed sequence tags suggests differential homeologous gene expression in the allotetraploid Coffea arabica. PLANT PHYSIOLOGY 2010; 154:1053-66. [PMID: 20864545 PMCID: PMC2971587 DOI: 10.1104/pp.110.162438] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Polyploidization constitutes a common mode of evolution in flowering plants. This event provides the raw material for the divergence of function in homeologous genes, leading to phenotypic novelty that can contribute to the success of polyploids in nature or their selection for use in agriculture. Mounting evidence underlined the existence of homeologous expression biases in polyploid genomes; however, strategies to analyze such transcriptome regulation remained scarce. Important factors regarding homeologous expression biases remain to be explored, such as whether this phenomenon influences specific genes, how paralogs are affected by genome doubling, and what is the importance of the variability of homeologous expression bias to genotype differences. This study reports the expressed sequence tag assembly of the allopolyploid Coffea arabica and one of its direct ancestors, Coffea canephora. The assembly was used for the discovery of single nucleotide polymorphisms through the identification of high-quality discrepancies in overlapped expressed sequence tags and for gene expression information indirectly estimated by the transcript redundancy. Sequence diversity profiles were evaluated within C. arabica (Ca) and C. canephora (Cc) and used to deduce the transcript contribution of the Coffea eugenioides (Ce) ancestor. The assignment of the C. arabica haplotypes to the C. canephora (CaCc) or C. eugenioides (CaCe) ancestral genomes allowed us to analyze gene expression contributions of each subgenome in C. arabica. In silico data were validated by the quantitative polymerase chain reaction and allele-specific combination TaqMAMA-based method. The presence of differential expression of C. arabica homeologous genes and its implications in coffee gene expression, ontology, and physiology are discussed.
Collapse
|
29
|
Zahn LM, Ma X, Altman NS, Zhang Q, Wall PK, Tian D, Gibas CJ, Gharaibeh R, Leebens-Mack JH, dePamphilis CW, Ma H. Comparative transcriptomics among floral organs of the basal eudicot Eschscholzia californica as reference for floral evolutionary developmental studies. Genome Biol 2010; 11:R101. [PMID: 20950453 PMCID: PMC3218657 DOI: 10.1186/gb-2010-11-10-r101] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Revised: 08/03/2010] [Accepted: 10/15/2010] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Molecular genetic studies of floral development have concentrated on several core eudicots and grasses (monocots), which have canalized floral forms. Basal eudicots possess a wider range of floral morphologies than the core eudicots and grasses and can serve as an evolutionary link between core eudicots and monocots, and provide a reference for studies of other basal angiosperms. Recent advances in genomics have enabled researchers to profile gene activities during floral development, primarily in the eudicot Arabidopsis thaliana and the monocots rice and maize. However, our understanding of floral developmental processes among the basal eudicots remains limited. RESULTS Using a recently generated expressed sequence tag (EST) set, we have designed an oligonucleotide microarray for the basal eudicot Eschscholzia californica (California poppy). We performed microarray experiments with an interwoven-loop design in order to characterize the E. californica floral transcriptome and to identify differentially expressed genes in flower buds with pre-meiotic and meiotic cells, four floral organs at preanthesis stages (sepals, petals, stamens and carpels), developing fruits, and leaves. CONCLUSIONS Our results provide a foundation for comparative gene expression studies between eudicots and basal angiosperms. We identified whorl-specific gene expression patterns in E. californica and examined the floral expression of several gene families. Interestingly, most E. californica homologs of Arabidopsis genes important for flower development, except for genes encoding MADS-box transcription factors, show different expression patterns between the two species. Our comparative transcriptomics study highlights the unique evolutionary position of E. californica compared with basal angiosperms and core eudicots.
Collapse
Affiliation(s)
- Laura M Zahn
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: American Association for the Advancement of Science, 1200 New York Avenue NW, Washington DC 20005, USA
| | - Xuan Ma
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- The Intercollege Graduate Program in Cell and Developmental Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Naomi S Altman
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Qing Zhang
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: 2367 Setter Run Lane, State College, PA 16802, USA
| | - P Kerr Wall
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: BASF Plant Science, 26 Davis Drive, Research Triangle Park, NC 27709, USA
| | - Donglan Tian
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: Department of Entomology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Cynthia J Gibas
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - Raad Gharaibeh
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - James H Leebens-Mack
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: Department of Plant Biology, University of Georgia, 120 Carlton Street, Athens, GA 30602, USA
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Hong Ma
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- The Intercollege Graduate Program in Cell and Developmental Biology, The Pennsylvania State University, University Park, PA 16802, USA
- State Key Laboratory of Genetic Engineering and School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China
- Institutes of Biomedical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai 200032, China
| |
Collapse
|
30
|
Rao DM, Moler JC, Ozden M, Zhang Y, Liang C, Karro JE. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression. Nucleic Acids Res 2010; 38:W737-42. [PMID: 20522511 PMCID: PMC2896108 DOI: 10.1093/nar/gkq470] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools.
Collapse
Affiliation(s)
- D M Rao
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio 45056, USA
| | | | | | | | | | | |
Collapse
|
31
|
Ballester B, Johnson N, Proctor G, Flicek P. Consistent annotation of gene expression arrays. BMC Genomics 2010; 11:294. [PMID: 20459806 PMCID: PMC2894801 DOI: 10.1186/1471-2164-11-294] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 05/11/2010] [Indexed: 02/03/2023] Open
Abstract
Background Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases. Results We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor. Conclusions Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.
Collapse
Affiliation(s)
- Benoît Ballester
- European Bioinformatics Institute EMBL, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | |
Collapse
|
32
|
Funari VA, Voevodski K, Leyfer D, Yerkes L, Cramer D, Tolan DR. Quantitative gene expression profiles in real time from expressed sequence tag databases. Gene Expr 2010; 14:321-36. [PMID: 20635574 PMCID: PMC2954622 DOI: 10.3727/105221610x12717040569820] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative arid quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http: //tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB's output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSC and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable.
Collapse
Affiliation(s)
| | | | - Dimitry Leyfer
- †Bioinformatics Program, Boston University, Boston, MA, USA
| | - Laura Yerkes
- *Biology Department, Boston University, Boston, MA, USA
| | - Donald Cramer
- *Biology Department, Boston University, Boston, MA, USA
| | - Dean R. Tolan
- *Biology Department, Boston University, Boston, MA, USA
- †Bioinformatics Program, Boston University, Boston, MA, USA
| |
Collapse
|
33
|
Gou X, Yuan T, Wei X, Russell SD. Gene expression in the dimorphic sperm cells of Plumbago zeylanica: transcript profiling, diversity, and relationship to cell type. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2009; 60:33-47. [PMID: 19500307 DOI: 10.1111/j.1365-313x.2009.03934.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Plumbago zeylanica produces cytoplasmically dimorphic sperm cells that target the egg and central cell during fertilization. In mature pollen, the larger sperm cell contains numerous mitochondria, is associated with the vegetative nucleus (S(vn)), and fuses preferentially with the central cell, forming endosperm. The other, plastid-enriched sperm cell (S(ua)) fuses with the egg cell, forming the zygote and embryo. Sperm expressed genes were investigated using ESTs produced from each sperm type; differential expression was validated through suppression subtractive hybridization, custom microarrays, real-time RT-PCR and in situ hybridization. The expression profiles of dimorphic sperm cells reflect a diverse and broad complement of genes, including high proportions of conserved and unknown genes, as well as distinct patterns of expression. A number of genes were highly up-regulated in the male germ line, including some genes that were differentially expressed in either the S(ua) or the S(vn). Differentially up-regulated genes in the egg-targeted S(ua) showed increased expression in transcription and translation categories, whereas the central cell-targeted S(vn) displayed expanded expression in the hormone biosynthesis category. Interestingly, the up-regulated genes expressed in the sperm cells appeared to reflect the expected post-fusion profiles of the future embryo and endosperm. As sperm cytoplasm is known to be transmitted during fertilization in this plant, sperm-contributed mRNAs are probably transported during fertilization, which could influence early embryo and endosperm development.
Collapse
Affiliation(s)
- Xiaoping Gou
- Department of Botany, University of Oklahoma, Norman, OK 73019, USA
| | | | | | | |
Collapse
|
34
|
Bragg LM, Stone G. k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage. Bioinformatics 2009; 25:2302-8. [PMID: 19570806 PMCID: PMC2735666 DOI: 10.1093/bioinformatics/btp410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation: The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence–similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. Results: We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared. Availability: The implementation of k-link is available under the terms of the GPL from http://www.bioinformatics.csiro.au/products.shtml. k-link is licensed under the GNU General Public License, and can be downloaded from http://www.bioinformatics.csiro.au/products.shtml. k-link is written in C++. Contact:lauren.bragg@csiro.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lauren M Bragg
- CSIRO Mathematical and Information Sciences, North Ryde, NSW, Australia.
| | | |
Collapse
|
35
|
Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics 2009; 10:347. [PMID: 19646272 PMCID: PMC2907694 DOI: 10.1186/1471-2164-10-347] [Citation(s) in RCA: 157] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 08/01/2009] [Indexed: 11/10/2022] Open
Abstract
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
Collapse
Affiliation(s)
- P Kerr Wall
- Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Picardi E, Mignone F, Pesole G. EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data. BMC Bioinformatics 2009; 10 Suppl 6:S10. [PMID: 19534735 PMCID: PMC2697633 DOI: 10.1186/1471-2105-10-s6-s10] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Background ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed loci. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an ad hoc genomic mapping. Methods EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site. Results The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human HOXA gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the Ricinus communis oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.
Collapse
Affiliation(s)
- Ernesto Picardi
- Dipartimento di Biochimica e Biologia Molecolare E, Quagliariello, Università degli Studi di Bari, 70126 Bari, Italy.
| | | | | |
Collapse
|
37
|
Venancio TM, Cristofoletti PT, Ferreira C, Verjovski-Almeida S, Terra WR. The Aedes aegypti larval transcriptome: a comparative perspective with emphasis on trypsins and the domain structure of peritrophins. INSECT MOLECULAR BIOLOGY 2009; 18:33-44. [PMID: 19054160 DOI: 10.1111/j.1365-2583.2008.00845.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The genome sequence of Aedes aegypti was recently reported. A significant amount of Expressed Sequence Tags (ESTs) were sequenced to aid in the gene prediction process. In the present work we describe an integrated analysis of the genomic and EST data, focusing on genes with preferential expression in larvae (LG), adults (AG) and in both stages (SG). A total of 913 genes (5.4% of the transcript complement) are LG, including ion transporters and cuticle proteins that are important for ion homeostasis and defense. From a starting set of 245 genes encoding the trypsin domain, we identified 66 putative LG, AG, and SG trypsins by manual curation. Phylogenetic analyses showed that AG trypsins are divergent from their larval counterparts (LG), grouping with blood-induced trypsins from Anopheles gambiae and Simulium vittatum. These results support the hypothesis that blood-feeding arose only once, in the ancestral Culicomorpha. Peritrophins are proteins that interlock chitin fibrils to form the peritrophic membrane (PM) that compartmentalizes the food in the midgut. These proteins are recognized by having chitin-binding domains with 6 conserved Cys and may also present mucin-like domains (regions expected to be highly O-glycosylated). PM may be formed by a ring of cells (type 2, seen in Ae. aegypti larvae and Drosophila melanogaster) or by most midgut cells (type 1, found in Ae. aegypti adult and Tribolium castaneum). LG and D. melanogaster peritrophins have more complex domain structures than AG and T. castaneum peritrophins. Furthermore, mucin-like domains of peritrophins from T. castaneum (feeding on rough food) are lengthier than those of adult Ae. aegypti (blood-feeding). This suggests, for the first time, that type 1 and type 2 PM may have variable molecular architectures determined by different peritrophins and/or ancillary proteins, which may be partly modulated by diet.
Collapse
Affiliation(s)
- T M Venancio
- Laboratory of Bioinformatics, Universidade de São Paulo, Sã Paulo, Brazil
| | | | | | | | | |
Collapse
|
38
|
Almeida FC, Desalle R. Orthology, function and evolution of accessory gland proteins in the Drosophila repleta group. Genetics 2009; 181:235-45. [PMID: 19015541 PMCID: PMC2621172 DOI: 10.1534/genetics.108.096263] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 11/10/2008] [Indexed: 01/03/2023] Open
Abstract
The accessory gland proteins (Acps) of Drosophila have become a model for the study of reproductive protein evolution. A major step in the study of Acps is to identify biological causes and consequences of the observed patterns of molecular evolution by comparing species groups with different biology. Here we characterize the Acp complement of Drosophila mayaguana, a repleta group representative. Species of this group show important differences in ecology and reproduction as compared to other Drosophila. Our results show that the extremely high rates of Acp evolution previously found are likely to be ubiquitous among species of the repleta group. These evolutionary rates are considerably higher than the ones observed in other Drosophila groups' Acps. This disparity, however, is not accompanied by major differences in the estimated number of Acps or in the functional categories represented as previously suggested. Among the genes expressed in accessory glands of D. mayaguana almost half are likely products of recent duplications. This allowed us to test predictions of the neofunctionalization model for gene duplication and paralog evolution in a more or less constrained timescale. We found that positive selection is a strong force in the early divergence of these gene pairs.
Collapse
|
39
|
Susko E, Roger AJ. Statistical analysis of expressed sequence tags. Methods Mol Biol 2009; 533:277-287. [PMID: 19277567 DOI: 10.1007/978-1-60327-136-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Expressed sequence tag (EST) surveys are an efficient way to characterize large numbers of genes from an organism. The rate of gene discovery in an EST survey depends on the degree of redundancy of the cDNA libraries from which sequences are obtained. We consider statistics for the comparison of EST libraries based upon the frequencies with which genes occur in subsamples of reads. These measures are useful in determining which of the libraries, having a large proportion of genes in common, is more likely to yield new genes in future reads. We also present tests, with multiple corrections adjustments, for whether genes are equally represented or expressed in a pair of libraries.
Collapse
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
40
|
Marakhonov AV, Baranova AV, Skoblov MY. Antisense regulation of human gene MAP3K13: True phenomenon or artifact? Mol Biol 2008. [DOI: 10.1134/s0026893308040055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
41
|
Lewers KS, Saski CA, Cuthbertson BJ, Henry DC, Staton ME, Main DS, Dhanaraj AL, Rowland LJ, Tomkins JP. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers. BMC PLANT BIOLOGY 2008; 8:69. [PMID: 18570660 PMCID: PMC2474608 DOI: 10.1186/1471-2229-8-69] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Accepted: 06/20/2008] [Indexed: 05/03/2023]
Abstract
BACKGROUND The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. RESULTS A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. CONCLUSION This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry.
Collapse
Affiliation(s)
- Kim S Lewers
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
| | - Chris A Saski
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Brandon J Cuthbertson
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
- National Institutes of Health/National Institute of Environmental Health Sciences, Laboratory of Signal Transduction, Peptide Hormone Action Group, 111 TW Alexander Drive, PO Box 12233, MD F3-04 Research Triangle Park, NC 27709-2233, USA
| | - David C Henry
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Meg E Staton
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Dorrie S Main
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
- Center for Integrated Biotechnology, Dept of Horticulture and Landscape Architecture, Washington State University, 45 Johnson Hall, Pullman, WA 99164-6414, USA
| | - Anik L Dhanaraj
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
- Monsanto Research Centre, Biotech Product Support, 44/2A Bellary Road, NH-7, Hebbal, Bangalore 560 092, India
| | - Lisa J Rowland
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
| | - Jeff P Tomkins
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
42
|
Freeman RM, Wu M, Cordonnier-Pratt MM, Pratt LH, Gruber CE, Smith M, Lander ES, Stange-Thomann N, Lowe CJ, Gerhart J, Kirschner M. cDNA sequences for transcription factors and signaling proteins of the hemichordate Saccoglossus kowalevskii: efficacy of the expressed sequence tag (EST) approach for evolutionary and developmental studies of a new organism. THE BIOLOGICAL BULLETIN 2008; 214:284-302. [PMID: 18574105 DOI: 10.2307/25470670] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We describe a collection of expressed sequence tags (ESTs) for Saccoglossus kowalevskii, a direct-developing hemichordate valuable for evolutionary comparisons with chordates. The 202,175 ESTs represent 163,633 arrayed clones carrying cDNAs prepared from embryonic libraries, and they assemble into 13,677 continuous sequences (contigs), leaving 10,896 singletons (excluding mitochondrial sequences). Of the contigs, 53% had significant matches when BLAST was used to query the NCBI databases (< or = 10(-10)), as did 51% of the singletons. Contigs most frequently matched sequences from amphioxus (29%), chordates (67%), and deuterostomes (87%). From the clone array, we isolated 400 full-length sequences for transcription factors and signaling proteins of use for evolutionary and developmental studies. The set includes sequences for fox, pax, tbx, hox, and other homeobox-containing factors, and for ligands and receptors of the TGFbeta, Wnt, Hh, Delta/Notch, and RTK pathways. At least 80% of key sequences have been obtained, when judged against gene lists of model organisms. The median length of these cDNAs is 2.3 kb, including 1.05 kb of 3' untranslated region (UTR). Only 30% are entirely matched by single contigs assembled from ESTs. We conclude that an EST collection based on 150,000 clones is a rich source of sequences for molecular developmental work, and that the EST approach is an efficient way to initiate comparative studies of a new organism.
Collapse
Affiliation(s)
- R M Freeman
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Cervigni GDL, Paniego N, Díaz M, Selva JP, Zappacosta D, Zanazzi D, Landerreche I, Martelotto L, Felitti S, Pessino S, Spangenberg G, Echenique V. Expressed sequence tag analysis and development of gene associated markers in a near-isogenic plant system of Eragrostis curvula. PLANT MOLECULAR BIOLOGY 2008; 67:1-10. [PMID: 18196464 DOI: 10.1007/s11103-007-9282-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 12/22/2007] [Indexed: 05/05/2023]
Abstract
Eragrostis curvula (Schrad.) Nees is a forage grass native to the semiarid regions of Southern Africa, which reproduces mainly by pseudogamous diplosporous apomixis. A collection of ESTs was generated from four cDNA libraries, three of them obtained from panicles of near-isogenic lines with different ploidy levels and reproductive modes, and one obtained from 12 days-old plant leaves. A total of 12,295 high-quality ESTs were clustered and assembled, rendering 8,864 unigenes, including 1,490 contigs and 7,394 singletons, with a genome coverage of 22%. A total of 7,029 (79.11%) unigenes were functionally categorized by BLASTX analysis against sequences deposited in public databases, but only 37.80% could be classified according to Gene Ontology. Sequence comparison against the cereals genes indexes (GI) revealed 50% significant hits. A total of 254 EST-SSRs were detected from 219 singletons and 35 from contigs. Di- and tri- motifs were similarly represented with percentages of 38.95 and 40.16%, respectively. In addition, 190 SNPs and Indels were detected in 18 contigs generated from 3 to 4 libraries. The ESTs and the molecular markers obtained in this study will provide valuable resources for a wide range of applications including gene identification, genetic mapping, cultivar identification, analysis of genetic diversity, phenotype mapping and marker assisted selection.
Collapse
Affiliation(s)
- Gerardo D L Cervigni
- Centro de Recursos Naturales Renovables de la Zona Semiárida-CONICET, Camino de La Carrindanga Km 7.0, Bahia Blanca, Argentina
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Schloss PD, Handelsman J. A statistical toolbox for metagenomics: assessing functional diversity in microbial communities. BMC Bioinformatics 2008; 9:34. [PMID: 18215273 PMCID: PMC2238731 DOI: 10.1186/1471-2105-9-34] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Accepted: 01/23/2008] [Indexed: 11/17/2022] Open
Abstract
Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.
Collapse
Affiliation(s)
- Patrick D Schloss
- Department of Microbiology, University of Massachusetts - Amherst, Amherst, MA 01003, USA.
| | | |
Collapse
|
45
|
Sakurai T, Plata G, Rodríguez-Zapata F, Seki M, Salcedo A, Toyoda A, Ishiwata A, Tohme J, Sakaki Y, Shinozaki K, Ishitani M. Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response. BMC PLANT BIOLOGY 2007; 7:66. [PMID: 18096061 PMCID: PMC2245942 DOI: 10.1186/1471-2229-7-66] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Accepted: 12/20/2007] [Indexed: 05/18/2023]
Abstract
BACKGROUND Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). RESULTS The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. CONCLUSION The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome.
Collapse
Affiliation(s)
- Tetsuya Sakurai
- Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Germán Plata
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Fausto Rodríguez-Zapata
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Motoaki Seki
- Plant Functional Genomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Andrés Salcedo
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Atsushi Toyoda
- Genome Core Technology Facilities, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Atsushi Ishiwata
- Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Joe Tohme
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Yoshiyuki Sakaki
- Genome Core Technology Facilities, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Kazuo Shinozaki
- Plant Functional Genomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Manabu Ishitani
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| |
Collapse
|
46
|
Peng FY, Reid KE, Liao N, Schlosser J, Lijavetzky D, Holt R, Martínez Zapater JM, Jones S, Marra M, Bohlmann J, Lund ST. Generation of ESTs in Vitis vinifera wine grape (Cabernet Sauvignon) and table grape (Muscat Hamburg) and discovery of new candidate genes with potential roles in berry development. Gene 2007; 402:40-50. [PMID: 17761391 DOI: 10.1016/j.gene.2007.07.016] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2007] [Revised: 06/26/2007] [Accepted: 07/17/2007] [Indexed: 11/30/2022]
Abstract
We report the generation and analysis of a total of 77,583 expressed sequence tags (ESTs) from two grapevine (Vitis vinifera L.) cultivars, Cabernet Sauvignon (wine grape) and Muscat Hamburg (table grape) with a focus on EST sequence quality and assembly optimization. The majority of the ESTs were derived from normalized cDNA libraries representing berry pericarp and seed developmental series, pooled non-berry tissues including root, flower, and leaf in Cabernet Sauvignon, and pooled tissues of berry, seed, and flower in Muscat Hamburg. EST and unigene sequence quality were determined by computational filtering coupled with small-scale contig reassembly, manual review, and BLAST analyses. EST assembly was optimized to better discriminate among closely related paralogs using two independent grape sequence sets, a previously published set of Vitis spp. gene families and our EST dataset derived from pooled leaf, flower, and root tissues of Cabernet Sauvignon. Sequence assembly within individual libraries indicated that those prepared from pooled tissues contributed the most to gene discovery. Annotations based upon searches against multiple databases including tomato and strawberry sequences helped to identify putative functions of ESTs and unigenes, particularly with respect to fleshy fruit development. Sequence comparison among the three wine grape libraries identified a number of genes preferentially expressed in the pericarp tissue, including transcription factors, receptor-like protein kinases, and hexose transporters. Gene ontology (GO) classification in the biological process aspect showed that GO categories corresponding to 'transport' and 'cell organization and biogenesis', which are associated with metabolite movement and cell wall structural changes during berry ripening, were higher in pericarp than in other tissues in the wine grape studied. The sequence data were used to characterize potential roles of new genes in berry development and composition.
Collapse
Affiliation(s)
- Fred Y Peng
- Wine Research Centre, Faculty of Land and Food Systems, University of British Columbia, 2205 East Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Lijoi A, Mena RH, Prünster I. A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 2007; 8:339. [PMID: 17868445 PMCID: PMC2220008 DOI: 10.1186/1471-2105-8-339] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Accepted: 09/14/2007] [Indexed: 11/30/2022] Open
Abstract
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Collapse
Affiliation(s)
- Antonio Lijoi
- Department of Economics and Quantitative Methods, University of Pavia, 27100 Pavia and Institute for Applied Mathematics and Information Technology, National Research Council, 20133 Milan, Italy
| | - Ramsés H Mena
- Research Institute for Applied Mathematics and Systems, National Autonomous University of Mexico, Mexico City, A.P. 20-726, Mexico
| | - Igor Prünster
- Department of Statistics and Applied Mathematics and ICER, University of Turin, 10122 Turin and Carlo Alberto College, 10024 Moncalieri, Italy
| |
Collapse
|
48
|
Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance. BMC Genomics 2007; 8:31. [PMID: 17254327 PMCID: PMC1796867 DOI: 10.1186/1471-2164-8-31] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Accepted: 01/25/2007] [Indexed: 12/19/2022] Open
Abstract
Background Improvement of Citrus, the most economically important fruit crop in the world, is extremely slow and inherently costly because of the long-term nature of tree breeding and an unusual combination of reproductive characteristics. Aside from disease resistance, major commercial traits in Citrus are improved fruit quality, higher yield and tolerance to environmental stresses, especially salinity. Results A normalized full length and 9 standard cDNA libraries were generated, representing particular treatments and tissues from selected varieties (Citrus clementina and C. sinensis) and rootstocks (C. reshni, and C. sinenis × Poncirus trifoliata) differing in fruit quality, resistance to abscission, and tolerance to salinity. The goal of this work was to provide a large expressed sequence tag (EST) collection enriched with transcripts related to these well appreciated agronomical traits. Towards this end, more than 54000 ESTs derived from these libraries were analyzed and annotated. Assembly of 52626 useful sequences generated 15664 putative transcription units distributed in 7120 contigs, and 8544 singletons. BLAST annotation produced significant hits for more than 80% of the hypothetical transcription units and suggested that 647 of these might be Citrus specific unigenes. The unigene set, composed of ~13000 putative different transcripts, including more than 5000 novel Citrus genes, was assigned with putative functions based on similarity, GO annotations and protein domains Conclusion Comparative genomics with Arabidopsis revealed the presence of putative conserved orthologs and single copy genes in Citrus and also the occurrence of both gene duplication events and increased number of genes for specific pathways. In addition, phylogenetic analysis performed on the ammonium transporter family and glycosyl transferase family 20 suggested the existence of Citrus paralogs. Analysis of the Citrus gene space showed that the most important metabolic pathways known to affect fruit quality were represented in the unigene set. Overall, the similarity analyses indicated that the sequences of the genes belonging to these varieties and rootstocks were essentially identical, suggesting that the differential behaviour of these species cannot be attributed to major sequence divergences. This Citrus EST assembly contributes both crucial information to discover genes of agronomical interest and tools for genetic and genomic analyses, such as the development of new markers and microarrays.
Collapse
|
49
|
Sipe CW, Dondeti VR, Saha MS. In silico gene selection for custom oligonucleotide microarray design. Methods Mol Biol 2007; 382:417-428. [PMID: 18220246 DOI: 10.1007/978-1-59745-304-2_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
A method for systematically selecting the large number of sequences needed to custom design an oligonucleotide microarray was presented. This approach uses a Perl script to query sequence databases with gene lists obtained from previously designed (and publicly available) microarrays. Homologous sequences passing a user-defined threshold are returned and stored in a candidate gene database. Using this versatile technique, microarrays can be designed for any organism having sequence data. In addition, the ability to select specific input gene lists allows the design of microarrays tailored to address questions pertaining to specific pathways or processes. Given recent concerns about the accuracy of annotation in public sequence databases, it is also necessary to confirm the correct orientation of candidate sequences. This step is performed by a second Perl script that extracts protein similarity information from individual Unigene records, checks for consistency of features, and adds this information to the candidate gene database. Discrepancies between the orientations determined using protein similarities and that predicted by a given sequence's assigned orientation are readily apparent by querying the candidate gene database.
Collapse
Affiliation(s)
- Conor W Sipe
- Department of Biology, College of William and Mary, Williamsburg, VA, USA
| | | | | |
Collapse
|
50
|
Abstract
Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.
Collapse
Affiliation(s)
- Amy Bouck
- Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|