1
|
Size does matter: 18 amino acids at the N-terminal tip of an amino acid transporter in Leishmania determine substrate specificity. Sci Rep 2015; 5:16289. [PMID: 26549185 PMCID: PMC4637868 DOI: 10.1038/srep16289] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 09/29/2015] [Indexed: 11/17/2022] Open
Abstract
Long N-terminal tails of amino acid transporters are known to act as sensors of the internal pool of amino acids and as positive regulators of substrate flux rate. In this study we establish that N-termini of amino acid transporters can also determine substrate specificity. We show that due to alternative trans splicing, the human pathogen Leishmania naturally expresses two variants of the proline/alanine transporter, one 18 amino acid shorter than the other. We demonstrate that the longer variant (LdAAP24) translocates both proline and alanine, whereas the shorter variant (∆18LdAAP24) translocates just proline. Remarkably, co-expressing the hydrophilic N-terminal peptide of the long variant with ∆18LdAAP24 was found to recover alanine transport. This restoration of alanine transport could be mediated by a truncated N-terminal tail, though truncations exceeding half of the tail length were no longer functional. Taken together, the data indicate that the first 18 amino acids of the negatively charged N-terminal LdAAP24 tail are required for alanine transport and may facilitate the electrostatic interactions of the entire negatively charged N-terminal tail with the positively charged internal loops in the transmembrane domain, as this mechanism has been shown to underlie regulation of substrate flux rate for other transporters.
Collapse
|
2
|
Evolutionary evidence for alternative structure in RNA sequence co-variation. PLoS Comput Biol 2013; 9:e1003152. [PMID: 23935473 PMCID: PMC3723493 DOI: 10.1371/journal.pcbi.1003152] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/05/2013] [Indexed: 02/06/2023] Open
Abstract
Sequence conservation and co-variation of base pairs are hallmarks of structured RNAs. For certain RNAs (e.g. riboswitches), a single sequence must adopt at least two alternative secondary structures to effectively regulate the message. If alternative secondary structures are important to the function of an RNA, we expect to observe evolutionary co-variation supporting multiple conformations. We set out to characterize the evolutionary co-variation supporting alternative conformations in riboswitches to determine the extent to which alternative secondary structures are conserved. We found strong co-variation support for the terminator, P1, and anti-terminator stems in the purine riboswitch by extending alignments to include terminator sequences. When we performed Boltzmann suboptimal sampling on purine riboswitch sequences with terminators we found that these sequences appear to have evolved to favor specific alternative conformations. We extended our analysis of co-variation to classic alignments of group I/II introns, tRNA, and other classes of riboswitches. In a majority of these RNAs, we found evolutionary evidence for alternative conformations that are compatible with the Boltzmann suboptimal ensemble. Our analyses suggest that alternative conformations are selected for and thus likely play functional roles in even the most structured of RNAs. RNA (Ribonucleic Acid) is a messenger of genetic information, master regulator, and catalyst in the cell. To carry out its function, RNA can fold into complex three-dimensional structures. Certain classes of RNAs, called riboswitches, adopt at least two alternative structures to act as a switch. We set out to detect the evolutionary signal for alternative structures in riboswitches as we hypothesize that these RNA sequences must have evolved to allow both conformations. We find that indeed such signals exist when we compare the sequences of riboswitches from multiple species. When we extend this analysis to other RNA regulators in the cell that are not thought of as switches, we detect equivalent evolutionary support for alternative structures. Viewed through the lens of evolutionary structure conservation RNA sequences appear to have adapted to adopt multiple conformations.
Collapse
|
3
|
Wang X, Zhong M, Liu Q, Aly SM, Wu C, Wen J. Molecular characterization of the carbon dioxide receptor in the oriental latrine fly, Chrysomya megacephala (Diptera: Calliphoridae). Parasitol Res 2013; 112:2763-71. [DOI: 10.1007/s00436-013-3410-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 03/22/2013] [Indexed: 12/01/2022]
|
4
|
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 2013; 22:1760-74. [PMID: 22955987 PMCID: PMC3431492 DOI: 10.1101/gr.135350.111] [Citation(s) in RCA: 3391] [Impact Index Per Article: 282.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Collapse
Affiliation(s)
- Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Frankish A, Mudge JM, Thomas M, Harrow J. The importance of identifying alternative splicing in vertebrate genome annotation. Database (Oxford) 2012; 2012:bas014. [PMID: 22434846 PMCID: PMC3308168 DOI: 10.1093/database/bas014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 02/09/2012] [Accepted: 02/10/2012] [Indexed: 12/17/2022]
Abstract
While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.
Collapse
Affiliation(s)
- Adam Frankish
- Human and Vertebrate Analysis and Annotation Team, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | |
Collapse
|
6
|
MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IHA, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB, Tyler-Smith C. A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012; 335:823-8. [PMID: 22344438 PMCID: PMC3299548 DOI: 10.1126/science.1215040] [Citation(s) in RCA: 906] [Impact Index Per Article: 69.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Collapse
|
7
|
Djebali S, Lagarde J, Kapranov P, Lacroix V, Borel C, Mudge JM, Howald C, Foissac S, Ucla C, Chrast J, Ribeca P, Martin D, Murray RR, Yang X, Ghamsari L, Lin C, Bell I, Dumais E, Drenkow J, Tress ML, Gelpí JL, Orozco M, Valencia A, van Berkum NL, Lajoie BR, Vidal M, Stamatoyannopoulos J, Batut P, Dobin A, Harrow J, Hubbard T, Dekker J, Frankish A, Salehi-Ashtiani K, Reymond A, Antonarakis SE, Guigó R, Gingeras TR. Evidence for transcript networks composed of chimeric RNAs in human cells. PLoS One 2012; 7:e28213. [PMID: 22238572 PMCID: PMC3251577 DOI: 10.1371/journal.pone.0028213] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 11/03/2011] [Indexed: 12/03/2022] Open
Abstract
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
Collapse
Affiliation(s)
- Sarah Djebali
- Bioinformatics and Genomics, Centre for Genomic Regulation and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Molecular characterization and expression pattern of an odorant receptor from the myiasis-causing blowfly, Lucilia sericata (Diptera: Calliphoridae). Parasitol Res 2011; 110:843-51. [DOI: 10.1007/s00436-011-2563-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 07/13/2011] [Indexed: 10/18/2022]
|
9
|
Kaczmarek K, Studencka M, Meinhardt A, Wieczerzak K, Thoms S, Engel W, Grzmil P. Overexpression of peroxisomal testis-specific 1 protein induces germ cell apoptosis and leads to infertility in male mice. Mol Biol Cell 2011; 22:1766-79. [PMID: 21460186 PMCID: PMC3093327 DOI: 10.1091/mbc.e09-12-0993] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Peroxisomal testis-specific 1 gene (Pxt1) is the only male germ cell-specific gene that encodes a peroxisomal protein known to date. To elucidate the role of Pxt1 in spermatogenesis, we generated transgenic mice expressing a c-MYC-PXT1 fusion protein under the control of the PGK2 promoter. Overexpression of Pxt1 resulted in induction of male germ cells' apoptosis mainly in primary spermatocytes, finally leading to male infertility. This prompted us to analyze the proapoptotic character of mouse PXT1, which harbors a BH3-like domain in the N-terminal part. In different cell lines, the overexpression of PXT1 also resulted in a dramatic increase of apoptosis, whereas the deletion of the BH3-like domain significantly reduced cell death events, thereby confirming that the domain is functional and essential for the proapoptotic activity of PXT1. Moreover, we demonstrated that PXT1 interacts with apoptosis regulator BAT3, which, if overexpressed, can protect cells from the PXT1-induced apoptosis. The PXT1-BAT3 association leads to PXT1 relocation from the cytoplasm to the nucleus. In summary, we demonstrated that PXT1 induces apoptosis via the BH3-like domain and that this process is inhibited by BAT3.
Collapse
Affiliation(s)
- Karina Kaczmarek
- Institute of Human Genetics, Georg-August-University of Göttingen, 37073 Göttingen, Germany
| | | | | | | | | | | | | |
Collapse
|
10
|
FACT: functional annotation transfer between proteins with similar feature architectures. BMC Bioinformatics 2010; 11:417. [PMID: 20696036 PMCID: PMC2931517 DOI: 10.1186/1471-2105-11-417] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 08/09/2010] [Indexed: 11/24/2022] Open
Abstract
Background The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents. Results We present the Feature Architecture Comparison Tool http://www.cibiv.at/FACT to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5 equivalent in Trypanosoma brucei. Conclusions FACT facilitates a quick and sensitive search for functionally equivalent proteins in entire proteomes. FACT is complementary to approaches using sequence similarity to identify proteins with the same function. Thus, FACT is particularly useful when functional equivalents need to be identified in evolutionarily distant species, or when functional equivalents are not homologous. The most reliable annotation transfers, however, are achieved when feature architecture similarity and sequence similarity are jointly taken into account.
Collapse
|
11
|
Sirota FL, Ooi HS, Gattermayer T, Schneider G, Eisenhaber F, Maurer-Stroh S. Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset. BMC Genomics 2010; 11 Suppl 1:S15. [PMID: 20158872 PMCID: PMC2822529 DOI: 10.1186/1471-2164-11-s1-s15] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023] Open
Abstract
BACKGROUND Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Consequently, several methods with different underlying principles for disorder prediction have been independently developed by various groups. For assessing their usability in automated workflows, we are interested in identifying parameter settings and threshold selections, under which the performance of these predictors becomes directly comparable. RESULTS First, we derived a new benchmark set that accounts for different flavours of disorder complemented with a similar amount of order annotation derived for the same protein set. We show that, using the recommended default parameters, the programs tested are producing a wide range of predictions at different levels of specificity and sensitivity. We identify settings, in which the different predictors have the same false positive rate. We assess conditions when sets of predictors can be run together to derive consensus or complementary predictions. This is useful in the framework of proteome-wide applications where high specificity is required such as in our in-house sequence analysis pipeline and the ANNIE webserver. CONCLUSIONS This work identifies parameter settings and thresholds for a selection of disorder predictors to produce comparable results at a desired level of specificity over a newly derived benchmark dataset that accounts equally for ordered and disordered regions of different lengths.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Biomolecular Function Discovery Division, Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), Matrix, Singapore.
| | | | | | | | | | | |
Collapse
|
12
|
Evidence for multiple recent host species shifts among the Ranaviruses (family Iridoviridae). J Virol 2009; 84:2636-47. [PMID: 20042506 DOI: 10.1128/jvi.01991-09] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Members of the genus Ranavirus (family Iridoviridae) have been recognized as major viral pathogens of cold-blooded vertebrates. Ranaviruses have been associated with amphibians, fish, and reptiles. At this time, the relationships between ranavirus species are still unclear. Previous studies suggested that ranaviruses from salamanders are more closely related to ranaviruses from fish than they are to ranaviruses from other amphibians, such as frogs. Therefore, to gain a better understanding of the relationships among ranavirus isolates, the genome of epizootic hematopoietic necrosis virus (EHNV), an Australian fish pathogen, was sequenced. Our findings suggest that the ancestral ranavirus was a fish virus and that several recent host shifts have taken place, with subsequent speciation of viruses in their new hosts. The data suggesting several recent host shifts among ranavirus species increase concern that these pathogens of cold-blooded vertebrates may have the capacity to cross numerous poikilothermic species barriers and the potential to cause devastating disease in their new hosts.
Collapse
|
13
|
Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome. BMC Genomics 2009; 10:606. [PMID: 20003482 PMCID: PMC2807441 DOI: 10.1186/1471-2164-10-606] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Accepted: 12/15/2009] [Indexed: 11/17/2022] Open
Abstract
Background Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. Results The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins. Conclusions Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.
Collapse
|
14
|
Klammer M, Messina DN, Schmitt T, Sonnhammer ELL. MetaTM - a consensus method for transmembrane protein topology prediction. BMC Bioinformatics 2009; 10:314. [PMID: 19785723 PMCID: PMC2761906 DOI: 10.1186/1471-2105-10-314] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2009] [Accepted: 09/28/2009] [Indexed: 02/06/2023] Open
Abstract
Background Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. Results A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. Conclusion Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at
Collapse
Affiliation(s)
- Martin Klammer
- Stockholm Bioinformatics Centre, Albanova, Stockholm University, 10691 Stockholm, Sweden.
| | | | | | | |
Collapse
|
15
|
Xu D. Computational methods for protein sequence comparison and search. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2009; Chapter 2:2.1.1-2.1.27. [PMID: 19365790 DOI: 10.1002/0471140864.ps0201s56] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein sequence comparison and search has become commonplace not only for bioinformatics researchers but also for experimentalists in many cases. Because of the exponential growth in sequence data, sequence comparison in particular has become an increasingly important tool. Relating a new gene sequence to other known sequences often reveals its function, structure, and evolution. Many sequence comparison and search tools are available through public Web servers, and biologists can use them easily with little knowledge of computers or bioinformatics. This unit provides some theoretical background and describes popular tools for dot plot, sequence search against a database, multiple sequence alignments, protein tree construction, and protein family and motif search. Step-by-step examples are provided to illustrate how to use some of the most well-known tools. Finally, some general advice is given on combining different sequence analysis tools for biological inference.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, Missouri
| |
Collapse
|
16
|
Malpel S, Merlin C, François MC, Jacquin-Joly E. Molecular identification and characterization of two new Lepidoptera chemoreceptors belonging to the Drosophila melanogaster OR83b family. INSECT MOLECULAR BIOLOGY 2008; 17:587-596. [PMID: 18828844 DOI: 10.1111/j.1365-2583.2008.00830.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In insect antennae, olfaction depends on olfactory receptors (ORs) that function through heterodimerization with an unusually highly conserved partner orthologue to the Drosophila melanogaster DOR83b. Here, we report the identification of two cDNAs encoding new DOR83b orthologues that represent the first members, although nonconventional, of the OR families of two noctuid crop pests, the cotton leafworm Spodoptera littoralis and the cabbage armyworm Mamestra brassicae. They both displayed high protein sequence conservation with previously identified DOR83b orthologues. Transcripts were abundantly detected in adult chemosensory organs as well as in fifth instar larvae heads. In adult antennae, the expression patterns of both genes revealed common features with other members of the OR83b subfamily: they appeared to be expressed at the bases of numerous olfactory sensilla belonging to different functional categories, suggesting that both receptors may be co-expressed with yet unidentified conventional ORs. Bioinformatic analyses predicted the occurrence of seven transmembrane domains and an unusual topology with intracellular N-termini and extracellular C-termini, extending to Lepidoptera the hypothesis of an inverted topology for DOR83b orthologues, demonstrated to date only in D. melanogaster.
Collapse
Affiliation(s)
- S Malpel
- INRA-UPMC-AgroParisTech UMR 1272 PISC Physiologie de l'Insecte: Signalisation et Communication, Versailles, France
| | | | | | | |
Collapse
|
17
|
Lundin C, Käll L, Kreher SA, Kapp K, Sonnhammer EL, Carlson JR, von Heijne G, Nilsson I. Membrane topology of the Drosophila OR83b odorant receptor. FEBS Lett 2007; 581:5601-4. [PMID: 18005664 PMCID: PMC2176074 DOI: 10.1016/j.febslet.2007.11.007] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Revised: 11/02/2007] [Accepted: 11/02/2007] [Indexed: 11/22/2022]
Abstract
By analogy to mammals, odorant receptors (ORs) in insects, such as Drosophila melanogaster, have long been thought to belong to the G-protein coupled receptor (GPCR) superfamily. However, recent work has cast doubt on this assumption and has tentatively suggested an inverted topology compared to the canonical N(out) - C(in) 7 transmembrane (TM) GPCR topology, at least for some Drosophila ORs. Here, we report a detailed topology mapping of the Drosophila OR83b receptor using engineered glycosylation sites as topology markers. Our results are inconsistent with a classical GPCR topology and show that OR83b has an intracellular N-terminus, an extracellular C-terminus, and 7TM helices.
Collapse
Affiliation(s)
- Carolina Lundin
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Lukas Käll
- Stockholm Bioinformatics Center, AlbaNova, SE-106 91 Stockholm, Sweden
| | - Scott A. Kreher
- Department of Molecular, Cellular, and Development Biology, Yale University, New Haven, CT 06520, USA
| | - Katja Kapp
- ZMBH (Zentrum für Molekulare Biologie Heidelberg), University Heidelberg, Im Neuenheimer Feld 282, D-69120 Heidelberg, Germany
| | | | - John R. Carlson
- Department of Molecular, Cellular, and Development Biology, Yale University, New Haven, CT 06520, USA
| | - Gunnar von Heijne
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
- Stockholm Bioinformatics Center, AlbaNova, SE-106 91 Stockholm, Sweden
| | - IngMarie Nilsson
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| |
Collapse
|
18
|
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JGR, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biol 2006; 7 Suppl 1:S4.1-9. [PMID: 16925838 PMCID: PMC1810553 DOI: 10.1186/gb-2006-7-s1-s4] [Citation(s) in RCA: 456] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. RESULTS The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. CONCLUSION In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
Collapse
Affiliation(s)
- Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol 2006; 7 Suppl 1:S3.1-13. [PMID: 16925837 PMCID: PMC1810552 DOI: 10.1186/gb-2006-7-s1-s3] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
Collapse
Affiliation(s)
- Vladimir B Bajic
- South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Huq NL, Cross KJ, Ung M, Reynolds EC. A review of protein structure and gene organisation for proteins associated with mineralised tissue and calcium phosphate stabilisation encoded on human chromosome 4. Arch Oral Biol 2005; 50:599-609. [PMID: 15892946 DOI: 10.1016/j.archoralbio.2004.12.009] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2004] [Accepted: 12/23/2004] [Indexed: 12/14/2022]
Abstract
Several proteins associated with mineralised tissue (teeth and bone) or involved in calcium phosphate stabilisation in the body fluids, milk and saliva have been mapped to the q arm of human chromosome 4. These include the dentine/bone proteins dentine sialophosphoprotein (DSPP), dentine matrix protein 1 (DMP1), bone sialoprotein (BSP), matrix extracellular phosphoglycoprotein, osteopontin (OPN), enamelin, ameloblastin, milk caseins, salivary statherin, and proline-rich proteins. The proposed function of those that are multiphosphorylated is: (i) the stabilisation of calcium phosphate in solution (e.g. casein, statherin) preventing spontaneous precipitation and seeded-crystal growth or (ii) promoting biomineralisation (e.g. the phosphophoryn domain of DSPP), where the protein described as a template macromolecule, is proposed to act as a nucleator/promoter of crystal growth. The genes of these proteins have been subjected to conserved chromosomal synteny during mammalian evolution. The multiphosphorylated proteins statherin, caseins, phosphophoryn, BSP and OPN have been characterised as intrinsically disordered. The codon usage patterns for the amino acid serine reveal a bias for AGC and AGT codons within the human genes dspp, dmp1 and bsp, mouse dspp and dmp1 but not significantly for statherin or caseins. This pattern was also observed in the gene encoding hen phosvitin that also contains stretches of multiphosphorylated serines and in the dmp1 gene sequences of mammalian, reptilian and avian classes. In conclusion, these intrinsically disordered multiphosphorylated proteins are the translation products of genes displaying examples of codon usage bias, internal repeats and conserved chromosomal synteny within the mammalian class.
Collapse
Affiliation(s)
- N Laila Huq
- Cooperative Research Centre for Oral Health Science, School of Dental Science, The University of Melbourne, 711 Elizabeth Street, Melbourne, Vic. 3010, Australia
| | | | | | | |
Collapse
|
21
|
Hodges E, Redelius JS, Wu W, Höög C. Accelerated discovery of novel protein function in cultured human cells. Mol Cell Proteomics 2005; 4:1319-27. [PMID: 15965266 DOI: 10.1074/mcp.m500117-mcp200] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Experimental approaches that enable direct investigation of human protein function are necessary for comprehensive annotation of the human proteome. We introduce a cell-based platform for rapid and unbiased functional annotation of undercharacterized human proteins. Utilizing a library of antibody biomarkers, the full-length proteins are investigated by tracking phenotypic changes caused by overexpression in human cell lines. We combine reverse transfection and immunodetection by fluorescence microscopy to facilitate this procedure at high resolution. Demonstrating the advantage of this approach, new annotations are provided for two novel proteins: 1) a membrane-bound O-acyltransferase protein (C3F) that, when overexpressed, disrupts Golgi and endosome integrity due likely to an endoplasmic reticulum-Golgi transport block and 2) a tumor marker (BC-2) that prompts a redistribution of a transcriptional silencing protein (BMI1) and a mitogen-activated protein kinase mediator (Rac1) to distinct nuclear regions that undergo chromatin compaction. Our strategy is an immediate application for directly addressing those proteins whose molecular function remains unknown.
Collapse
Affiliation(s)
- Emily Hodges
- Center for Genomics and Bioinformatics, Karolinska Institute, SE-171 77 Stockholm, Sweden
| | | | | | | |
Collapse
|
22
|
Henricson A, Käll L, Sonnhammer ELL. A novel transmembrane topology of presenilin based on reconciling experimental and computational evidence. FEBS J 2005; 272:2727-33. [PMID: 15943807 DOI: 10.1111/j.1742-4658.2005.04691.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The transmembrane topology of presenilins is still the subject of debate despite many experimental topology studies using antibodies or gene fusions. The results from these studies are partly contradictory and consequently several topology models have been proposed. Studies of presenilin-interacting proteins have produced further contradiction, primarily regarding the location of the C-terminus. It is thus impossible to produce a topology model that agrees with all published data on presenilin. We have analyzed the presenilin topology through computational sequence analysis of the presenilin family and the homologous presenilin-like protein family. Members of these families are intramembrane-cleaving aspartyl proteases. Although the overall sequence homology between the two families is low, they share the conserved putative active site residues and the conserved 'PAL' motif. Therefore, the topology model for the presenilin-like proteins can give some clues about the presenilin topology. Here we propose a novel nine-transmembrane topology with the C-terminus in the extracytosolic space. This model has strong support from published data on gamma-secretase function and presenilin topology. Contrary to most presenilin topology models, we show that hydrophobic region X is probably a transmembrane segment. Consequently, the C-terminus would be located in the extracytosolic space. However, the last C-terminal amino acids are relatively hydrophobic and in conjunction with existing experimental data we cannot exclude the possibility that the extreme C-terminus could be buried within the gamma-secretase complex. This might explain the difficulties in obtaining consistent experimental evidence regarding the location of the C-terminal region of presenilin.
Collapse
Affiliation(s)
- Anna Henricson
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | | | | |
Collapse
|
23
|
Martínez C, Sanjuan M, Dent J, Karlsson L, Ware J. Human septin-septin interactions as a prerequisite for targeting septin complexes in the cytosol. Biochem J 2005; 382:783-91. [PMID: 15214843 PMCID: PMC1133953 DOI: 10.1042/bj20040372] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2004] [Revised: 06/03/2004] [Accepted: 06/23/2004] [Indexed: 12/30/2022]
Abstract
Septins are a cytosolic GTP-binding protein family first characterized in yeast, but gaining increasing recognition as critical protagonists in higher eukaryotic cellular events. Mammalian septins have been associated with cytokinesis and exocytosis, along with contributing to the development of neurological disorders. Ten different septins, divided into four groups, have been identified in mammals, and individual septins are capable of interacting with each other to form macromolecular complexes. The present study characterizes the structural requirements for human septin-septin interactions using a yeast two-hybrid system. We focus on three septins that are highly expressed in platelets and neurons, SEPT4 [previously designated H5, CDCrel-2 (cell-division-control-related-2), PNUTL2], SEPT5 (CDCrel-1, PNUTL1) and SEPT8 (KIAA0202). Each of these three septins contains a characteristic domain structure consisting of unique N- and C-termini, and a central core domain conserved among the family of proteins. The yeast two-hybrid system yielded data consistent with a model where each of the three septins can interact with itself (homotypic assembly) or with one of the other septins (heterotypic assembly). For SEPT5 and SEPT8, the results illustrate a model whereby heterotypic septin assembly is dependent on the conserved central core domain and homotypic interactions require the N- and C-termini of each protein. We also characterized a model in which the proper cellular localization of SEPT5 and SEPT8 requires concomitant expression of both proteins. Co-transfection of SEPT5 and SEPT8 results in both proteins targeted to a vesicular-like location. Therefore the cellular repertoire of human septins has an impact on function by targeting septin macromolecular complexes to specific cellular locations.
Collapse
Affiliation(s)
- Constantino Martínez
- *The Roon Research Center for Arteriosclerosis and Thrombosis, Division of Experimental Hemostasis and Thrombosis, Department of Molecular and Experimental Medicine The Scripps Research Institute, La Jolla, CA 92037, U.S.A
| | - Miguel A. Sanjuan
- †Johnson & Johnson Pharmaceutical Research & Development, L.L.C., La Jolla, CA 92121, U.S.A
| | - Judith A. Dent
- *The Roon Research Center for Arteriosclerosis and Thrombosis, Division of Experimental Hemostasis and Thrombosis, Department of Molecular and Experimental Medicine The Scripps Research Institute, La Jolla, CA 92037, U.S.A
| | - Lars Karlsson
- †Johnson & Johnson Pharmaceutical Research & Development, L.L.C., La Jolla, CA 92121, U.S.A
| | - Jerry Ware
- *The Roon Research Center for Arteriosclerosis and Thrombosis, Division of Experimental Hemostasis and Thrombosis, Department of Molecular and Experimental Medicine The Scripps Research Institute, La Jolla, CA 92037, U.S.A
- To whom correspondence should be addressed at the present address: Department of Physiology and Biophysics, #505, University of Arkansas for Medical Sciences, Little Rock, AR 72223, U.S.A. (email )
| |
Collapse
|
24
|
Akerman M, Shaked-Mishan P, Mazareb S, Volpin H, Zilberstein D. Novel motifs in amino acid permease genes from Leishmania. Biochem Biophys Res Commun 2005; 325:353-66. [PMID: 15522240 DOI: 10.1016/j.bbrc.2004.09.212] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Indexed: 11/29/2022]
Abstract
Eight amino acid permease genes from the protozoan parasite Leishmania donovani (AAPLDs) were cloned, sequenced, and shown to be expressed in promastigotes. Seven of these belong to the amino acid transporter-1 and one to the amino acid polyamino-choline superfamilies. Using these sequences as well as known and characterized amino acid permease genes from all kingdoms, a training set was established and used to search for motifs, using the MEME motif discovery tool. This study revealed two motifs that are specific to the genus Leishmania, four to the family trypanosomatidae, and a single motif that is common between trypanosomatidae and mammalian systems A1 and N. Interestingly, most of these motifs are clustered in two regions of 50-60 amino acids. Blast search analyses indicated a close relationship between the L. donovani and Trypanosoma brucei amino acid permeases. The results of this work describe the cloning of the first amino acid permease genes in parasitic protozoa and contribute to the understanding of amino acid permease evolution in these organisms. Furthermore, the identification of genus-specific motifs in these proteins might be useful to better understand parasite physiology within its hosts.
Collapse
Affiliation(s)
- Martin Akerman
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | | | | | | | | |
Collapse
|
25
|
Functional characterization in Caenorhabditis elegans of transmembrane worm-human orthologs. BMC Genomics 2004; 5:85. [PMID: 15533247 PMCID: PMC533873 DOI: 10.1186/1471-2164-5-85] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2004] [Accepted: 11/08/2004] [Indexed: 11/10/2022] Open
Abstract
Background The complete genome sequences for human and the nematode Caenorhabditis elegans offer an opportunity to learn more about human gene function through functional characterization of orthologs in the worm. Based on a previous genome-wide analysis of worm-human orthologous transmembrane proteins, we selected seventeen genes to explore experimentally in C. elegans. These genes were selected on the basis that they all have high confidence candidate human orthologs and that their function is unknown. We first analyzed their phylogeny, membrane topology and domain organization. Then gene functions were studied experimentally in the worm by using RNA interference and transcriptional gfp reporter gene fusions. Results The experiments gave functional insights for twelve of the genes studied. For example, C36B1.12, the worm ortholog of three presenilin-like genes, was almost exclusively expressed in head neurons, suggesting an ancient conserved role important to neuronal function. We propose a new transmembrane topology for the presenilin-like protein family. sft-4, the worm ortholog of surfeit locus gene Surf-4, proved to be an essential gene required for development during the larval stages of the worm. R155.1, whose human ortholog is entirely uncharacterized, was implicated in body size control and other developmental processes. Conclusions By combining bioinformatics and C. elegans experiments on orthologs, we provide functional insights on twelve previously uncharacterized human genes.
Collapse
|
26
|
Chalk AM, Wahlestedt C, Sonnhammer ELL. Improved and automated prediction of effective siRNA. Biochem Biophys Res Commun 2004; 319:264-74. [PMID: 15158471 DOI: 10.1016/j.bbrc.2004.04.181] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2004] [Indexed: 12/18/2022]
Abstract
Short interfering RNAs are used in functional genomics studies to knockdown a single gene in a reversible manner. The results of siRNA experiments are highly dependent on the choice of siRNA sequence. In order to evaluate siRNA design rules, we collected a database of 398 siRNAs of known efficacy from 92 genes. We used this database to evaluate previously proposed rules from smaller datasets, and to find a new set of rules that are optimal for the entire database. We also trained a regression tree with full cross-validation. It was however difficult to obtain the same precision as methods previously tested on small datasets from one or two genes. We show that those methods are overfitting as they work poorly on independent validation datasets from multiple genes. Our new design rules can predict siRNAs with efficacy >/= 50% in 91% of cases, and with efficacy >/=90% in 52% of cases, which is more than a twofold improvement over random selection. Software for designing siRNAs is available online via a web server at or as a standalone version for high-throughput applications.
Collapse
Affiliation(s)
- Alistair M Chalk
- Center for Genomics and Bioinformatics, Karolinska Institutet, Berzelius väg 35, S-171 77 Stockholm, Sweden.
| | | | | |
Collapse
|
27
|
Abstract
The proportion of the genome encoding intrinsically unstructured proteins increases with the complexity of organisms, which demands specific mechanism(s) for generating novel genetic material of this sort. Here it is suggested that one such mechanism is the expansion of internal repeat regions, i.e., coding micro- and minisatellites. An analysis of 126 known unstructured sequences shows the preponderance of repeats: the percentage of proteins with tandemly repeated short segments is much higher in this class (39%) than earlier reported for all Swiss-Prot (14%), yeast (18%) or human (28%) proteins. Furthermore, prime examples, such as salivary proline-rich proteins, titin, eukaryotic RNA polymerase II, the prion protein and several others, demonstrate that the repetitive segments carry fundamental function in these proteins. In addition, their repeat numbers show functionally significant interspecies variation and polymorphism, which underlines that these regions have been shaped by intense evolutionary activity. In all, the major point of this paper is that the genetic instability of repetitive regions combined with the structurally and functionally permissive nature of unstructured proteins has powered the extension and possible functional expansion of this newly recognized protein class.
Collapse
Affiliation(s)
- Peter Tompa
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, 1518 Budapest, PO Box 7, Hungary.
| |
Collapse
|
28
|
da Fonseca PCA, Morris SA, Nerou EP, Taylor CW, Morris EP. Domain organization of the type 1 inositol 1,4,5-trisphosphate receptor as revealed by single-particle analysis. Proc Natl Acad Sci U S A 2003; 100:3936-41. [PMID: 12651956 PMCID: PMC153026 DOI: 10.1073/pnas.0536251100] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The inositol 1,4,5-trisphosphate receptor (IP(3)R) is a tetrameric intracellular Ca(2+) channel, which mediates the release of Ca(2+) from the endoplasmic reticulum in response to many different extracellular stimuli. We present a 3D structure of the type 1 IP(3)R obtained by electron microscopy and single-particle analysis that reveals its domain organization. The IP(3)R has a flower-like appearance with fourfold symmetry and is made up of three distinct domains connected by slender links. By relating the organization of the structural domains to secondary-structure predictions and biochemical data we develop a model in which structural domains are mapped onto the amino acid sequence to deduce the location of the channel region and the cytoplasmic inositol 1,4,5-trisphosphate-binding and modulatory subdomains. The structure of the IP(3)R is compared with that of other tetrameric cation channels. The channel domain is similar in size and shape to its counterparts in the ryanodine receptor and the Shaker voltage-gated K(+) channel.
Collapse
Affiliation(s)
- Paula C A da Fonseca
- Biomedical Sciences Division, Sir Alexander Fleming Building, Imperial College of Science Technology and Medicine, London SW7 2AZ, United Kingdom
| | | | | | | | | |
Collapse
|
29
|
|