1
|
Clark S, Yu F, Gu L, Min XJ. Expanding Alternative Splicing Identification by Integrating Multiple Sources of Transcription Data in Tomato. Front Plant Sci 2019; 10:689. [PMID: 31191588 PMCID: PMC6546887 DOI: 10.3389/fpls.2019.00689] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 05/08/2019] [Indexed: 05/17/2023]
Abstract
Tomato (Solanum lycopersicum) is an important vegetable and fruit crop. Its genome was completely sequenced and there are also a large amount of available expressed sequence tags (ESTs) and short reads generated by RNA sequencing (RNA-seq) technologies. Mapping transcripts including mRNA sequences, ESTs, and RNA-seq reads to the genome allows identifying pre-mRNA alternative splicing (AS), a post-transcriptional process generating two or more RNA isoforms from one pre-mRNA transcript. We comprehensively analyzed the AS landscape in tomato by integrating genome mapping information of all available mRNA and ESTs with mapping information of RNA-seq reads which were collected from 27 published projects. A total of 369,911 AS events were identified from 34,419 genomic loci involving 161,913 transcripts. Within the basic AS events, intron retention is the prevalent type (18.9%), followed by alternative acceptor site (12.9%) and alternative donor site (7.3%), with exon skipping as the least type (6.0%). Complex AS types having two or more basic event accounted for 54.9% of total AS events. Within 35,768 annotated protein-coding gene models, 23,233 gene models were found having pre-mRNAs generating AS isoform transcripts. Thus the estimated AS rate was 65.0% in tomato. The list of identified AS genes with their corresponding transcript isoforms serves as a catalog for further detailed examination of gene functions in tomato biology. The post-transcriptional information is also expected to be useful in improving the predicted gene models in tomato. The sequence and annotation information can be accessed at plant alternative splicing database (http://proteomics.ysu.edu/altsplice).
Collapse
Affiliation(s)
- Sarah Clark
- Department of Biological Sciences, Youngstown State University, Youngstown, OH, United States
| | - Feng Yu
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH, United States
| | - Lianfeng Gu
- Basic Forestry and Proteomics Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xiang Jia Min
- Department of Biological Sciences, Youngstown State University, Youngstown, OH, United States
- *Correspondence: Xiang Jia Min,
| |
Collapse
|
2
|
Tao Z, Liu GX, Cai L, Yu H, Min XJ, Gan HT, Yang K, Sq L, Yan J, Chen L, Tan QH, Wu JC, Huang XL. Characteristics of Small Intestinal Diseases on Single-Balloon Enteroscopy: A Single-Center Study Conducted Over 6 Years in China. Medicine (Baltimore) 2015; 94:e1652. [PMID: 26496270 PMCID: PMC4620798 DOI: 10.1097/md.0000000000001652] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The small intestine has been considered inaccessible for a long term. The development of single-balloon endoscopy has greatly improved the diagnosis and treatment possibilities for small intestinal diseases.In this study, we aimed to explore the demographic characteristics and small intestinal diseases of patients who underwent single-balloon enteroscopy between 2009 and 2014 at our endoscopy center. We determined the enteroscopic findings for each small intestinal disease and the most susceptible age groups.In total, 186 patients were included in the study. Their mean age was 45.87 ± 15.77 years. Patients who underwent single-balloon enteroscopy were found to have neoplasms (most common age group: 14-45 years, most common lesion location: jejunum), lymphoma (46-59 and 60-74 years, ileum), protuberant lesions (45-59 years, jejunum), inflammation (14-45 and 46-59 years, ileum), benign ulcers (14-45 years, jejunum), diverticulum (14-45 years, ileum), vascular malformations (60-74 years, jejunum), polyps (14-45 years, jejunum), Crohn's disease (14-45 years, jejunum), hookworm infection (14-45 years, jejunum), lipid pigmentation (14-45 and 46-59 years, jejunum), undetermined bleeding (46-59 years, ileum), or undetermined stenosis (31 years, duodenum). Each small intestinal disease had distinct enteroscopic findings.
Collapse
Affiliation(s)
- Zhang Tao
- From the Gastroenterology, West China Hospital, Sichuan University, Gastroenterology, Nanchong Central Hospital (ZT); Gastroenterology and Geriatrics, West China Hospital, Sichuan University (LGX, YK, YJ); Gastroenterology, West China Hospital, Sichuan University (CL, YH, LC, QHT, JCW, XLH); Endoscopy Center, West China Hospital, Sichuan University (MXJ); and Gastroenterology, Nanchong Central Hospital (LSQ)
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Min XJ, Powell B, Braessler J, Meinken J, Yu F, Sablok G. Genome-wide cataloging and analysis of alternatively spliced genes in cereal crops. BMC Genomics 2015; 16:721. [PMID: 26391769 PMCID: PMC4578763 DOI: 10.1186/s12864-015-1914-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 09/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Protein functional diversity at the post-transcriptional level is regulated through spliceosome mediated pre-mRNA alternative splicing (AS) events and that has been widely demonstrated to be a key player in regulating the functional diversity in plants. Identification and analysis of AS genes in cereal crop plants are critical for crop improvement and understanding regulatory mechanisms. Results We carried out the comparative analyses of the functional landscapes of the AS using the consensus assembly of expressed sequence tags and available mRNA sequences in four cereal plants. We identified a total of 8,734 in Oryza sativa subspecies (ssp) japonica, 2,657 in O. sativa ssp indica, 3,971 in Sorghum bicolor, and 10,687 in Zea mays AS genes. Among the identified AS events, intron retention remains to be the dominant type accounting for 23.5 % in S. bicolor, and up to 55.8 % in O. sativa ssp indica. We identified a total of 887 AS genes that were conserved among Z. mays, S. bicolor, and O. sativa ssp japonica; and 248 AS genes were found to be conserved among all four studied species or ssp. Furthermore, we identified 53 AS genes conserved with Brachypodium distachyon. Gene Ontology classification of AS genes revealed functional assignment of these genes in many biological processes with diverse molecular functions. Conclusions AS is common in cereal plants. The AS genes identified in four cereal crops in this work provide the foundation for further studying the roles of AS in regulation of cereal plant growth and development. The data can be accessed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/). Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1914-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiang Jia Min
- Department of Biological Sciences, Youngstown State University, Youngstown, OH, 44555, USA. .,Center for Applied Chemical Biology, Youngstown State University, Youngstown, OH, 44555, USA.
| | - Brian Powell
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH, 44555, USA
| | - Jonathan Braessler
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH, 44555, USA
| | - John Meinken
- Center for Applied Chemical Biology, Youngstown State University, Youngstown, OH, 44555, USA.,Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH, 44555, USA.,Present address: Center for Health Informatics, University of Cincinnati, Cincinnati, OH, 45267-0840, USA
| | - Feng Yu
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH, 44555, USA
| | - Gaurav Sablok
- Plant Functional Biology and Climate Change Cluster (C3), University of Technology Sydney, PO Box 123, Broadway, NSW, 2007, Australia
| |
Collapse
|
4
|
Meinken J, Walker G, Cooper CR, Min XJ. MetazSecKB: the human and animal secretome and subcellular proteome knowledgebase. Database (Oxford) 2015; 2015:bav077. [PMID: 26255309 PMCID: PMC4529745 DOI: 10.1093/database/bav077] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 07/14/2015] [Indexed: 12/15/2022]
Abstract
The subcellular location of a protein is a key factor in determining the molecular function of the protein in an organism. MetazSecKB is a secretome and subcellular proteome knowledgebase specifically designed for metazoan, i.e. human and animals. The protein sequence data, consisting of over 4 million entries with 121 species having a complete proteome, were retrieved from UniProtKB. Protein subcellular locations including secreted and 15 other subcellular locations were assigned based on either curated experimental evidence or prediction using seven computational tools. The protein or subcellular proteome data can be searched and downloaded using several different types of identifiers, gene name or keyword(s), and species. BLAST search and community annotation of subcellular locations are also supported. Our primary analysis revealed that the proteome sizes, secretome sizes and other subcellular proteome sizes vary tremendously in different animal species. The proportions of secretomes vary from 3 to 22% (average 8%) in metazoa species. The proportions of other major subcellular proteomes ranged approximately 21–43% (average 31%) in cytoplasm, 20–37% (average 30%) in nucleus, 3–19% (average 12%) as plasma membrane proteins and 3–9% (average 6%) in mitochondria. We also compared the protein families in secretomes of different primates. The Gene Ontology and protein family domain analysis of human secreted proteins revealed that these proteins play important roles in regulation of human structure development, signal transduction, immune systems and many other biological processes. Database URL:http://proteomics.ysu.edu/secretomes/animal/index.php
Collapse
Affiliation(s)
- John Meinken
- Department of Computer Science and Information Systems, Center for Applied Chemical Biology and
| | - Gary Walker
- Center for Applied Chemical Biology and Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA
| | - Chester R Cooper
- Center for Applied Chemical Biology and Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA
| | - Xiang Jia Min
- Center for Applied Chemical Biology and Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA
| |
Collapse
|
5
|
Abstract
Expressed Sequence Tags (ESTs) are a rich resource for identifying Alternatively Splicing (AS) genes. The ASFinder webserver is designed to identify AS isoforms from EST-derived sequences. Two approaches are implemented in ASFinder. If no genomic sequences are provided, the server performs a local BLASTN to identify AS isoforms from ESTs having both ends aligned but an internal segment unaligned. Otherwise, ASFinder uses SIM4 to map ESTs to the genome, then the overlapping ESTs that are mapped to the same genomic locus and have internal variable exon/intron boundaries are identified as AS isoforms. The tool is available at http://proteomics.ysu.edu/tools/ASFinder.html.
Collapse
Affiliation(s)
- Xiang Jia Min
- Center for Applied Chemical Biology, Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA.
| |
Collapse
|
6
|
Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li LT, Zhang Q, Kim MJ, Schatz MC, Campbell M, Li J, Bowers JE, Tang H, Lyons E, Ferguson AA, Narzisi G, Nelson DR, Blaby-Haas CE, Gschwend AR, Jiao Y, Der JP, Zeng F, Han J, Min XJ, Hudson KA, Singh R, Grennan AK, Karpowicz SJ, Watling JR, Ito K, Robinson SA, Hudson ME, Yu Q, Mockler TC, Carroll A, Zheng Y, Sunkar R, Jia R, Chen N, Arro J, Wai CM, Wafula E, Spence A, Han Y, Xu L, Zhang J, Peery R, Haus MJ, Xiong W, Walsh JA, Wu J, Wang ML, Zhu YJ, Paull RE, Britt AB, Du C, Downie SR, Schuler MA, Michael TP, Long SP, Ort DR, Schopf JW, Gang DR, Jiang N, Yandell M, dePamphilis CW, Merchant SS, Paterson AH, Buchanan BB, Li S, Shen-Miller J. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol 2013; 14:R41. [PMID: 23663246 PMCID: PMC4053705 DOI: 10.1186/gb-2013-14-5-r41] [Citation(s) in RCA: 273] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Accepted: 05/10/2013] [Indexed: 11/20/2022] Open
Abstract
Background Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan. Results The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment. Conclusions The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
Collapse
|
7
|
Abstract
Recently, Brachypodium distachyon has emerged as a model plant for studying monocot grasses and cereal crops. Using assembled expressed transcript sequences and subsequent mapping to the corresponding genome, we identified 1219 alternative splicing (AS) events spanning across 2021 putatively assembled transcripts generated from 941 genes. Approximately, 6.3% of expressed genes are alternatively spliced in B. distachyon. We observed that a majority of the identified AS events were related to retained introns (55.5%), followed by alternative acceptor sites (16.7%). We also observed a low percentage of exon skipping (5.0%) and alternative donor site events (8.8%). The 'complex event' that consists of a combination of two or more basic splicing events accounted for ∼14.0%. Comparative AS transcript analysis revealed 163 and 39 homologous pairs between B. distachyon and Oryza sativa and between B. distachyon and Arabidopsis thaliana, respectively. In all, we found 16 AS transcripts to be conserved in all 3 species. AS events and related putative assembled transcripts annotation can be systematically browsed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/plant/).
Collapse
Affiliation(s)
- Braden Walters
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555, USA
| | - Gengkon Lum
- Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555, USA
| | - Gaurav Sablok
- Sustainable Agro-ecosystems and Bioresources Department, IASMA Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, San Michele all'Adige, Trentino 38010, Italy
| | - Xiang Jia Min
- Center for Applied Chemical Biology, Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA
| |
Collapse
|
8
|
Wang J, Na JK, Yu Q, Gschwend AR, Han J, Zeng F, Aryal R, VanBuren R, Murray JE, Zhang W, Navajas-Pérez R, Feltus FA, Lemke C, Tong EJ, Chen C, Man Wai C, Singh R, Wang ML, Min XJ, Alam M, Charlesworth D, Moore PH, Jiang J, Paterson AH, Ming R. Sequencing papaya X and Yh chromosomes reveals molecular basis of incipient sex chromosome evolution. Proc Natl Acad Sci U S A 2012; 109:13710-5. [PMID: 22869747 PMCID: PMC3427123 DOI: 10.1073/pnas.1207833109] [Citation(s) in RCA: 194] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Sex determination in papaya is controlled by a recently evolved XY chromosome pair, with two slightly different Y chromosomes controlling the development of males (Y) and hermaphrodites (Y(h)). To study the events of early sex chromosome evolution, we sequenced the hermaphrodite-specific region of the Y(h) chromosome (HSY) and its X counterpart, yielding an 8.1-megabase (Mb) HSY pseudomolecule, and a 3.5-Mb sequence for the corresponding X region. The HSY is larger than the X region, mostly due to retrotransposon insertions. The papaya HSY differs from the X region by two large-scale inversions, the first of which likely caused the recombination suppression between the X and Y(h) chromosomes, followed by numerous additional chromosomal rearrangements. Altogether, including the X and/or HSY regions, 124 transcription units were annotated, including 50 functional pairs present in both the X and HSY. Ten HSY genes had functional homologs elsewhere in the papaya autosomal regions, suggesting movement of genes onto the HSY, whereas the X region had none. Sequence divergence between 70 transcripts shared by the X and HSY revealed two evolutionary strata in the X chromosome, corresponding to the two inversions on the HSY, the older of which evolved about 7.0 million years ago. Gene content differences between the HSY and X are greatest in the older stratum, whereas the gene content and order of the collinear regions are identical. Our findings support theoretical models of early sex chromosome evolution.
Collapse
Affiliation(s)
- Jianping Wang
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Jong-Kuk Na
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Qingyi Yu
- Texas AgriLife Research Center, Department of Plant Pathology and Microbiology, Texas A&M University, Weslaco, TX 78596
- Hawaii Agriculture Research Center, Kunia, HI 96759
| | - Andrea R. Gschwend
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Jennifer Han
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Fanchang Zeng
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Rishi Aryal
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Robert VanBuren
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Jan E. Murray
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Wenli Zhang
- Department of Horticulture, University of Wisconsin, Madison, WI 53706
| | | | - F. Alex Feltus
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30606
| | - Cornelia Lemke
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30606
| | - Eric J. Tong
- Hawaii Agriculture Research Center, Kunia, HI 96759
| | - Cuixia Chen
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| | - Ching Man Wai
- Hawaii Agriculture Research Center, Kunia, HI 96759
- Department of Tropical Plants and Soil Sciences, University of Hawaii, Honolulu, HI 96822
| | | | - Ming-Li Wang
- Hawaii Agriculture Research Center, Kunia, HI 96759
| | - Xiang Jia Min
- Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555
| | - Maqsudul Alam
- Advanced Studies in Genomics, Proteomics and Bioinformatics, University of Hawaii, Honolulu, HI 96822; and
| | - Deborah Charlesworth
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| | | | - Jiming Jiang
- Department of Horticulture, University of Wisconsin, Madison, WI 53706
| | - Andrew H. Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30606
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| |
Collapse
|
9
|
Abstract
The Fungal Secretome KnowledgeBase (FunSecKB) provides a resource of secreted fungal proteins, i.e. secretomes, identified from all available fungal protein data in the NCBI RefSeq database. The secreted proteins were identified using a well evaluated computational protocol which includes SignalP, WolfPsort and Phobius for signal peptide or subcellular location prediction, TMHMM for identifying membrane proteins, and PS-Scan for identifying endoplasmic reticulum (ER) target proteins. The entries were mapped to the UniProt database and any annotations of subcellular locations that were either manually curated or computationally predicted were included in FunSecKB. Using a web-based user interface, the database is searchable, browsable and downloadable by using NCBI’s RefSeq accession or gi number, UniProt accession number, keyword or by species. A BLAST utility was integrated to allow users to query the database by sequence similarity. A user submission tool was implemented to support community annotation of subcellular locations of fungal proteins. With the complete fungal data from RefSeq and associated web-based tools, FunSecKB will be a valuable resource for exploring the potential applications of fungal secreted proteins. Database URL:http://proteomics.ysu.edu/secretomes/fungi.php
Collapse
Affiliation(s)
- Gengkon Lum
- Department of Computer Science and Information Systems, Center for Applied Chemical Biology, Youngstown State University, Youngstown, OH 44555, USA
| | | |
Collapse
|
10
|
Albu M, Min XJ, Golding GB, Hickey D. Nucleotide substitution bias within the genus Drosophila affects the pattern of proteome evolution. Genome Biol Evol 2009; 1:288-93. [PMID: 20333198 PMCID: PMC2817423 DOI: 10.1093/gbe/evp028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2009] [Indexed: 11/14/2022] Open
Abstract
The availability of complete genome sequences for 12 Drosophila species provides an unprecedented resource for large-scale studies of genome evolution. In this study, we looked for correlated shifts in the patterns of genome and proteome evolution within the genus Drosophila. Specifically, we asked if the nucleotide composition of the Drosophila willistoni genome--which is significantly less GC rich than the other 11 sequenced Drosophila genomes--is reflected in an altered pattern of amino acid substitutions in the encoded proteins. Our results show that this is indeed the case: There are large and highly significant asymmetries in the patterns of amino acid substitution between D. willistoni and Drosophila melanogaster, and they are in the direction predicted by the nucleotide biases. The implication of this result, combined with previous studies on long-term proteome evolution, is that substitutional biases at the DNA level can be a major factor in determining both the long-term and the short-term directions of proteome evolution.
Collapse
Affiliation(s)
- Mihai Albu
- Department of Biology, Concordia University, Montréal, Québec, Canada
| | | | | | | |
Collapse
|
11
|
Abstract
The relative rates of nucleotide substitution at synonymous and nonsynonymous sites within protein-coding regions have been widely used to infer the action of natural selection from comparative sequence data. It is known, however, that mutational and repair biases can affect rates of evolution at both synonymous and nonsynonymous sites. More importantly, it is also known that synonymous sites are particularly prone to the effects of nucleotide bias. This means that nucleotide biases may affect the calculated ratio of substitution rates at synonymous and nonsynonymous sites. Using a large data set of animal mitochondrial sequences, we demonstrate that this is, in fact, the case. Highly biased nucleotide sequences are characterized by significantly elevated dN/dS ratios, but only when the nucleotide frequencies are not taken into account. When the analysis is repeated taking the nucleotide frequencies at each codon position into account, such elevated ratios disappear. These results suggest that the recently reported differences in dN/dS ratios between vertebrate and invertebrate mitochondrial sequences could be explained by variations in mitochondrial nucleotide frequencies rather than the effects of positive Darwinian selection.
Collapse
|
12
|
Abstract
Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We compared orthologous genes between animal mitochondrial genomes that show large differences in GC and AT skews. Specifically, we compared the mitochondrial genomes of mammals, which are characterized by a negative GC skew and a positive AT skew, to those of flatworms, which show the opposite skews for both GC and AT base pairs. We found that the mammalian proteins are highly enriched in amino acids encoded by CA-rich codons (as predicted by their negative GC and positive AT skews), whereas their flatworm orthologs were enriched in amino acids encoded by GT-rich codons (also as predicted from their skews). We found that these differences in mitochondrial strand asymmetry (measured as GC and AT skews) can have very large, predictable effects on the composition of the encoded proteins.
Collapse
Affiliation(s)
- Xiang Jia Min
- Department of Biology, Concordia University, 7141 Sherbrooke West, Montreal, Quebec, Canada H4B 1R6
| | | |
Collapse
|
13
|
Abstract
DNA barcodes have achieved prominence as a tool for species-level identifications. Consequently, there is a rapidly growing database of these short sequences from a wide variety of taxa. In this study, we have analyzed the correlation between the nucleotide content of the short DNA barcode sequences and the genomes from which they are derived. Our results show that such short sequences can yield important, and surprisingly accurate, information about the composition of the entire genome. In other words, for unsequenced genomes, the DNA barcodes can provide a quick preview of the whole genome composition.
Collapse
Affiliation(s)
- Xiang Jia Min
- Department of Biology, Concordia University, Montreal, Quebec, Canada
| | - Donal A. Hickey
- Department of Biology, Concordia University, Montreal, Quebec, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
14
|
Abstract
DNA barcoding shows enormous promise for the rapid identification of organisms at the species level. There has been much recent debate, however, about the need for longer barcode sequences, especially when these sequences are used to construct molecular phylogenies. Here, we have analysed a set of fungal mitochondrial sequences — of various lengths — and we have monitored the effect of reducing sequence length on the utility of the data for both species identification and phylogenetic reconstruction. Our results demonstrate that reducing sequence length has a profound effect on the accuracy of resulting phylogenetic trees, but surprisingly short sequences still yield accurate species identifications. We conclude that the standard short barcode sequences (∼600 bp) are not suitable for inferring accurate phylogenetic relationships, but they are sufficient for species identification among the fungi.
Collapse
Affiliation(s)
- Xiang Jia Min
- Department of Biology, Concordia University 7141 Sherbrooke West, Montreal, Quebec, Canada H4B 1R6
| | | |
Collapse
|
15
|
Semova N, Storms R, John T, Gaudet P, Ulycznyj P, Min XJ, Sun J, Butler G, Tsang A. Generation, annotation, and analysis of an extensive Aspergillus niger EST collection. BMC Microbiol 2006; 6:7. [PMID: 16457709 PMCID: PMC1434744 DOI: 10.1186/1471-2180-6-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Accepted: 02/02/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Aspergillus niger, a saprophyte commonly found on decaying vegetation, is widely used and studied for industrial purposes. Despite its place as one of the most important organisms for commercial applications, the lack of available information about its genetic makeup limits research with this filamentous fungus. RESULTS We present here the analysis of 12,820 expressed sequence tags (ESTs) generated from A. niger cultured under seven different growth conditions. These ESTs identify about 5,108 genes of which 44.5% code for proteins sharing similarity (E < or = 1e(-5)) with GenBank entries of known function, 38% code for proteins that only share similarity with GenBank entries of unknown function and 17.5% encode proteins that do not have a GenBank homolog. Using the Gene Ontology hierarchy, we present a first classification of the A. niger proteins encoded by these genes and compare its protein repertoire with other well-studied fungal species. We have established a searchable web-based database that includes the EST and derived contig sequences and their annotation. Details about this project and access to the annotated A. niger database are available. CONCLUSION This EST collection and its annotation provide a significant resource for fundamental and applied research with A. niger. The gene set identified in this manuscript will be highly useful in the annotation of the genome sequence of A. niger, the genes described in the manuscript, especially those encoding hydrolytic enzymes will provide a valuable source for researchers interested in enzyme properties and applications.
Collapse
Affiliation(s)
- Natalia Semova
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
| | - Reginald Storms
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
- Department of Biology, Concordia University, Montreal, Canada
| | - Tricia John
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
| | - Pascale Gaudet
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
- Northwestern University, 676 N. St. Clair Street, Chicago, IL 60611
| | - Peter Ulycznyj
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
| | - Xiang Jia Min
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
| | - Jian Sun
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
| | - Greg Butler
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada
| | - Adrian Tsang
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Canada
- Department of Biology, Concordia University, Montreal, Canada
| |
Collapse
|
16
|
O'Toole N, Min XJ, Butler G, Storms R, Tsang A. Sequence-Based Analysis of Fungal Secretomes. ACTA ACUST UNITED AC 2006. [DOI: 10.1016/s1874-5334(06)80015-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
|
17
|
Abstract
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at https://fungalgenome.concordia.ca/tools/OrfPredictor.html.
Collapse
Affiliation(s)
- Xiang Jia Min
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
- To whom correspondence should be addressed. Tel: +1 514 848 2424, ext. 5791; Fax: +1 514 848 4504;
| | - Gregory Butler
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
- Department of Computer Science, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
| | - Reginald Storms
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
- Department of Biology, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
| | - Adrian Tsang
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
- Department of Biology, Concordia UniversityMontreal, Quebec, Canada H4B 1R6
| |
Collapse
|
18
|
Abstract
TargetIdentifier is a webserver that identifies full-length cDNA sequences from the expressed sequence tag (EST)-derived contig and singleton data. To accomplish this TargetIdentifier uses BLASTX alignments as a guide to locate protein coding regions and potential start and stop codons. This information is then used to determine whether the EST-derived sequences include their translation start codons. The algorithm also uses the BLASTX output to assign putative functions to the query sequences. The server is available at .
Collapse
Affiliation(s)
- Xiang Jia Min
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Quebec H4B 1R6, Canada.
| | | | | | | |
Collapse
|