Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Birney E, Andrews D, Bevan P, Caccamo M, Cameron G, Chen Y, Clarke L, Coates G, Cox T, Cuff J, Curwen V, Cutts T, Down T, Durbin R, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz H, Iyer V, Kahari A, Jekosch K, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Ureta-Vidal A, Woodwark C, Clamp M, Hubbard T. Ensembl 2004. Nucleic Acids Res 2004;32:D468-70. [PMID: 14681459 PMCID: PMC308772 DOI: 10.1093/nar/gkh038] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Birney E, Andrews D, Bevan P, Caccamo M, Cameron G, Chen Y, Clarke L, Coates G, Cox T, Cuff J, Curwen V, Cutts T, Down T, Durbin R, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz H, Iyer V, Kahari A, Jekosch K, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Ureta-Vidal A, Woodwark C, Clamp M, Hubbard T. Ensembl 2004. Nucleic Acids Res 2004;32:D468-70. [PMID: 14681459 PMCID: PMC308772 DOI: 10.1093/nar/gkh038] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number	Cited by Other Article(s)
1	Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol 2018;14:e1005968. [PMID: 29377902 PMCID: PMC5805351 DOI: 10.1371/journal.pcbi.1005968] [Citation(s) in RCA: 143] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 02/08/2018] [Accepted: 01/10/2018] [Indexed: 11/19/2022] Open Abstract Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
2	A novel splice variant of XIAP-associated factor 1 (XAF1) is expressed in peripheral blood containing gastric cancer-derived circulating tumor cells. Gastric Cancer 2015;18:751-61. [PMID: 25216542 DOI: 10.1007/s10120-014-0426-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 08/23/2014] [Indexed: 02/07/2023] Abstract BACKGROUND XIAP-associated factor 1 (XAF1) is ubiquitously expressed in normal tissues, but its suppression in cancer cells is strongly associated with tumor progression. Although downregulation of XAF1 is observed in tumors, its expression profile in the peripheral blood of cancer patients has not yet been investigated. Here, we identified a novel XAF1 splice variant in cancer cells and then investigated the expression level of this variant in peripheral blood containing gastric cancer-derived circulating tumor cells (CTCs). METHODS To identify splice variants, RT-PCR and DNA sequencing were performed in mRNAs extracted from many cancer cells. We then carried out quantitative RT-PCR to investigate expression in peripheral blood from all 96 gastric cancer patients and 22 healthy volunteers. RESULTS The XAF1 variant harbored a premature termination codon (PTC) and was differentially expressed in highly metastatic cancer cells versus the parental cells, and that nonsense-mediated mRNA decay (NMD) was suppressed in the variant-expressing cells. Furthermore, splice variants of XAF1 were upregulated in peripheral blood containing CTCs. In XAF1 variant-expressing patients, the expression levels of other NMD-targeted genes also increased, suggesting that the NMD pathway was suppressed in CTCs. CONCLUSIONS Our study identified a novel splice variant of XAF1 in cancer cells. This variant was regulated through the NMD pathway and accumulated in NMD-suppressed metastatic cancer cells and peripheral blood containing CTCs. The presence of XAF1 transcripts harboring the PTC in the peripheral blood may be useful as an indicator of NMD inhibition in CTCs. Collapse Key Words Alternative splicing Circulating tumor cells Gastric cancer Nonsense-mediated mRNA decay Quantitative real-time polymerase chain reaction (qRT-PCR) Collapse MESH Headings Adaptor Proteins, Signal Transducing Adult Aged Aged, 80 and over Apoptosis Regulatory Proteins Female Humans Intracellular Signaling Peptides and Proteins/genetics Male Middle Aged Neoplasm Proteins/genetics Neoplastic Cells, Circulating/pathology Protein Isoforms/genetics RNA, Messenger/analysis RNA, Messenger/genetics Real-Time Polymerase Chain Reaction Stomach Neoplasms/blood Stomach Neoplasms/genetics Stomach Neoplasms/pathology Transcriptome Collapse Grants Collapse Affiliation(s) Collapse
3	Metabolic and chaperone gene loss marks the origin of animals: evidence for Hsp104 and Hsp78 chaperones sharing mitochondrial enzymes as clients. PLoS One 2015;10:e0117192. [PMID: 25710177 PMCID: PMC4339202 DOI: 10.1371/journal.pone.0117192] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 12/17/2014] [Indexed: 12/31/2022] Open Abstract The evolution of animals involved acquisition of an emergent gene repertoire for gastrulation. Whether loss of genes also co-evolved with this developmental reprogramming has not yet been addressed. Here, we identify twenty-four genetic functions that are retained in fungi and choanoflagellates but undetectable in animals. These lost genes encode: (i) sixteen distinct biosynthetic functions; (ii) the two ancestral eukaryotic ClpB disaggregases, Hsp78 and Hsp104, which function in the mitochondria and cytosol, respectively; and (iii) six other assorted functions. We present computational and experimental data that are consistent with a joint function for the differentially localized ClpB disaggregases, and with the possibility of a shared client/chaperone relationship between the mitochondrial Fe/S homoaconitase encoded by the lost LYS4 gene and the two ClpBs. Our analyses lead to the hypothesis that the evolution of gastrulation-based multicellularity in animals led to efficient extraction of nutrients from dietary sources, loss of natural selection for maintenance of energetically expensive biosynthetic pathways, and subsequent loss of their attendant ClpB chaperones. Collapse Key Words Collapse MESH Headings Aconitate Hydratase/classification Aconitate Hydratase/genetics Animals Bayes Theorem Choanoflagellata/genetics Endopeptidase Clp/classification Endopeptidase Clp/genetics Heat-Shock Proteins/genetics Heat-Shock Proteins/metabolism Likelihood Functions Mitochondria/enzymology Mitochondria/metabolism Mutation Phylogeny Promoter Regions, Genetic Saccharomyces cerevisiae/genetics Saccharomyces cerevisiae/metabolism Saccharomyces cerevisiae Proteins/genetics Saccharomyces cerevisiae Proteins/metabolism Collapse Grants Collapse Affiliation(s) Collapse
4	Discovery of a metabolic alternative to the classical mevalonate pathway. eLife 2013;2:e00672. [PMID: 24327557 PMCID: PMC3857490 DOI: 10.7554/elife.00672] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Eukarya, Archaea, and some Bacteria encode all or part of the essential mevalonate (MVA) metabolic pathway clinically modulated using statins. Curiously, two components of the MVA pathway are often absent from archaeal genomes. The search for these missing elements led to the discovery of isopentenyl phosphate kinase (IPK), one of two activities necessary to furnish the universal five-carbon isoprenoid building block, isopentenyl diphosphate (IPP). Unexpectedly, we now report functional IPKs also exist in Bacteria and Eukarya. Furthermore, amongst a subset of species within the bacterial phylum Chloroflexi, we identified a new enzyme catalyzing the missing decarboxylative step of the putative alternative MVA pathway. These results demonstrate, for the first time, a functioning alternative MVA pathway. Key to this pathway is the catalytic actions of a newly uncovered enzyme, mevalonate phosphate decarboxylase (MPD) and IPK. Together, these two discoveries suggest that unforeseen variation in isoprenoid metabolism may be widespread in nature. DOI: http://dx.doi.org/10.7554/eLife.00672.001. Collapse Key Words Archaea Chloroflexi Isopentenyl diphosphate Mevalonate pathway Mevalonate phosphate decarboxylase Plants Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
5	Robust demographic inference from genomic and SNP data. PLoS Genet 2013;9:e1003905. [PMID: 24204310 PMCID: PMC3812088 DOI: 10.1371/journal.pgen.1003905] [Citation(s) in RCA: 832] [Impact Index Per Article: 75.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 09/11/2013] [Indexed: 01/09/2023] Open Abstract We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with ∂a∂i, the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. Collapse Key Words Collapse MESH Headings Computer Simulation Demography Genetics, Population Genome, Human Genomics Humans Polymorphism, Single Nucleotide/genetics Population Groups Collapse Grants Collapse Affiliation(s) Collapse
6	Novel protein isoforms of carcinoembryonic antigen are secreted from pancreatic, gastric and colorectal cancer cells. BMC Res Notes 2013;6:381. [PMID: 24070190 PMCID: PMC3850884 DOI: 10.1186/1756-0500-6-381] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 09/24/2013] [Indexed: 12/23/2022] Open Abstract Background Carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) is an oncofetal cell surface glycoprotein. Because of its high expression in cancer cells and secretion into serum, CEA has been widely used as a serum tumor marker. Although other members of CEACAM family were investigated for splice variants/variants-derived protein isoforms, few studies about the variants of CEACAM5 have been reported. In this study, we demonstrated the existence of novel CEACAM5 splice variants and splice variant-derived protein isoforms in gastrointestinal cancer cell lines. Results We identified two novel CEACAM5 splice variants in gastrointestinal (pancreatic, gastric, and colorectal) cancer cell lines. One of the variants possessed an alternative minor splice site that allowed generation of GC-AG intron. Furthermore, CEA protein isoforms derived from the novel splice variants were expressed in cancer cell lines and those protein isoforms were secreted into the culture medium. Although CEA protein isoforms always co-existed with the full-length protein, the secretion patterns of these isoforms did not correlate with the expression patterns. Conclusions This is the first study to identify the expression of CEA isoforms derived from the novel splice variants processed on the unique splice site. In addition, we also revealed the secretion of those isoforms from gastrointestinal cancer cell lines. Our findings suggested that discrimination between the full-length and identified protein isoforms may improve the clinical utility of CEA as a tumor marker. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
7	Database tools in genetic diseases research. Genomics 2013;101:75-85. [DOI: 10.1016/j.ygeno.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 10/26/2012] [Accepted: 11/01/2012] [Indexed: 01/22/2023] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
8	Using Galaxy to perform large-scale interactive data analyses. ACTA ACUST UNITED AC 2012;Chapter 10:10.5.1-10.5.47. [PMID: 22700312 DOI: 10.1002/0471250953.bi1005s38] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datasets are large and analysis tool set-up and use is riddled with complexities outside of the scope of core research activities. The authors believe that Galaxy provides a powerful solution that simplifies data acquisition and analysis in an intuitive Web application, granting all researchers access to key informatics tools previously only available to computational specialists working in Unix-based environments. We will demonstrate through a series of biomedically relevant protocols how Galaxy specifically brings together (1) data retrieval from public and private sources, for example, UCSC's Eukaryote and Microbial Genome Browsers, (2) custom tools (wrapped Unix functions, format standardization/conversions, interval operations), and 3rd-party analysis tools. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
9	Identification and characterization of lineage-specific highly conserved noncoding sequences in Mammalian genomes. Genome Biol Evol 2012;4:641-57. [PMID: 22505575 PMCID: PMC3381673 DOI: 10.1093/gbe/evs035] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2012] [Indexed: 01/12/2023] Open Abstract Vertebrate genome comparisons revealed that there are highly conserved noncoding sequences (HCNSs) among a wide range of species and many of which contain regulatory elements. However, recently emerged sequences conserved in specific lineages have not been well studied. Toward this end, we identified 8,198 primate and 21,128 specific HCNSs as representative ones among mammals from human-marmoset and mouse-rat comparisons, respectively. Derived allele frequency analysis of primate-specific HCNSs showed that these HCNSs were under purifying selection, indicating that they may harbor important functions. We selected the top 1,000 largest HCNSs and compared the lineage-specific HCNS-flanking genes (LHF genes) with ultraconserved element (UCE)-flanking genes. Interestingly, the majority of LHF genes were different from UCE-flanking genes. This lineage-specific set of LHF genes was more enriched in protein-binding function. Conversely, the number of LHF genes that were also shared by UCEs was small but significantly larger than random expectation, and many of these genes were involved in anatomical development as transcriptional regulators, suggesting that certain groups of genes preferentially recruit new HCNSs in addition to old HCNSs that are conserved among vertebrates. This group of LHF genes might be involved in the various levels of lineage-specific evolution among vertebrates, mammals, primates, and rodents. If so, the emergence of HCNSs in and around these two groups of LHF genes developed lineage-specific characteristics. Our results provide new insight into lineage-specific evolution through interactions between HCNSs and their LHF genes. Collapse Key Words lineage-specific evolution conserved noncoding sequence mammals Collapse MESH Headings Animals Base Sequence Callithrix Conserved Sequence/genetics Evolution, Molecular Genome Humans Mice Molecular Sequence Data Primates Rats Regulatory Sequences, Nucleic Acid Sequence Alignment Sequence Analysis, DNA Sequence Homology, Nucleic Acid Collapse Grants Collapse Affiliation(s) Collapse
10	The integration and annotation of the human interactome in the UniHI Database. Methods Mol Biol 2012;812:175-188. [PMID: 22218860 DOI: 10.1007/978-1-61779-455-1_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023] Abstract In recent years, remarkable progress has been made toward the systematic charting of human protein interactions. The utilization of the generated interaction data remained however challenging for biomedical researchers due to lack of integration of currently available resources. To facilitate the direct access and analysis of the human interactome, we have developed the Unified Human Interactome (UniHI) database. It provides researchers with a user-friendly Web-interface and integrates interaction data from 12 major resources in its latest version, establishing one of the largest catalogs for human PPIs worldwide. At present, UniHI houses over 250,000 distinct interactions between 22,300 unique proteins and is publically available at http://www.unihi.org. Collapse Key Words Collapse MESH Headings Computer Graphics Databases, Protein Humans Internet Molecular Sequence Annotation/methods Organ Specificity Protein Interaction Mapping/methods Proteins/metabolism User-Computer Interface Collapse Grants Collapse Affiliation(s) Collapse
11	Testing computational prediction of missense mutation phenotypes: functional characterization of 204 mutations of human cystathionine beta synthase. Proteins 2010;78:2058-74. [PMID: 20455263 DOI: 10.1002/prot.22722] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Predicting the phenotypes of missense mutations uncovered by large-scale sequencing projects is an important goal in computational biology. High-confidence predictions can be an aid in focusing experimental and association studies on those mutations most likely to be associated with causative relationships between mutation and disease. As an aid in developing these methods further, we have derived a set of random mutations of the enzymatic domains of human cystathionine beta synthase. This enzyme is a dimeric protein that catalyzes the condensation of serine and homocysteine to produce cystathionine. Yeast missing this enzyme cannot grow on medium lacking a source of cysteine, while transfection of functional human CBS into yeast strains missing endogenous enzyme can successfully complement for the missing gene. We used PCR mutagenesis with error-prone Taq polymerase to produce 948 colonies and compared cell growth in the presence or absence of a cysteine source as a measure of CBS function. We were able to infer the phenotypes of 204 single-site mutants, 79 of them deleterious and 125 neutral. This set was used to test the accuracy of six publicly available prediction methods for phenotype prediction of missense mutations: SIFT, PolyPhen, PMut, SNPs3D, PhD-SNP, and nsSNPAnalyzer. The top methods are PolyPhen, SIFT, and nsSNPAnalyzer, which have similar performance. Using kernel discriminant functions, we found that the difference in position-specific scoring matrix values is more predictive than the wild-type PSSM score alone, and that the relative surface area in the biologically relevant complex is more predictive than that of the monomeric proteins. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
12	PathEx: a novel multi factors based datasets selector web tool. BMC Bioinformatics 2010;11:528. [PMID: 20969778 PMCID: PMC2978222 DOI: 10.1186/1471-2105-11-528] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 10/22/2010] [Indexed: 11/27/2022] Open Abstract Background Microarray experiments have become very popular in life science research. However, if such experiments are only considered independently, the possibilities for analysis and interpretation of many life science phenomena are reduced. The accumulation of publicly available data provides biomedical researchers with a valuable opportunity to either discover new phenomena or improve the interpretation and validation of other phenomena that partially understood or well known. This can only be achieved by intelligently exploiting this rich mine of information. Description Considering that technologies like microarrays remain prohibitively expensive for researchers with limited means to order their own experimental chips, it would be beneficial to re-use previously published microarray data. For certain researchers interested in finding gene groups (requiring many replicates), there is a great need for tools to help them to select appropriate datasets for analysis. These tools may be effective, if and only if, they are able to re-use previously deposited experiments or to create new experiments not initially envisioned by the depositors. However, the generation of new experiments requires that all published microarray data be completely annotated, which is not currently the case. Thus, we propose the PathEx approach. Conclusion This paper presents PathEx, a human-focused web solution built around a two-component system: one database component, enriched with relevant biological information (expression array, omics data, literature) from different sources, and another component comprising sophisticated web interfaces that allow users to perform complex dataset building queries on the contents integrated into the PathEx database. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
13	Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. ACTA ACUST UNITED AC 2010;Chapter 1:1.11.1-1.11.51. [DOI: 10.1002/0471250953.bi0111s30] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
14	Bioinformatic tools for identifying disease gene and SNP candidates. Methods Mol Biol 2010;628:307-19. [PMID: 20238089 DOI: 10.1007/978-1-60327-367-1_17] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Abstract As databases of genome data continue to grow, our understanding of the functional elements of the genome grows as well. Many genetic changes in the genome have now been discovered and characterized, including both disease-causing mutations and neutral polymorphisms. In addition to experimental approaches to characterize specific variants, over the past decade, there has been intense bioinformatic research to understand the molecular effects of these genetic changes. In addition to genomic experimental assays, the bioinformatic efforts have focused on two general areas. First, researchers have annotated genetic variation data with molecular features that are likely to affect function. Second, statistical methods have been developed to predict mutations that are likely to have a molecular effect. In this protocol manuscript, methods for understanding the molecular functions of single nucleotide polymorphisms (SNPs) and mutations are reviewed and described. The intent of this chapter is to provide an introduction to the online tools that are both easy to use and useful. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
15	Comprehensive transcriptional profiling of human epidermis, reconstituted epidermal equivalents, and cultured keratinocytes using DNA microarray chips. Methods Mol Biol 2010;585:193-223. [PMID: 19908006 DOI: 10.1007/978-1-60761-380-0_15] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Abstract Because of its accessibility, skin has been among the first organs analyzed using DNA microarrays; psoriasis, melanomas, carcinomas, chronic wound biopsies, and epidermal keratinocytes in culture have been intensely investigated. Skin has everything: stem cells, differentiation, signaling, inflammation, diseases, cancer, etc. Here we provide step-by-step instructions for bioinformatics analysis of transcriptional profiling of skin. Specifically, we describe the use of GCOS and RMA programs for initial normalization and selection of differentially expressed genes, DAVID and LOLA programs for annotation of genes, and statistically relevant identification of over- and under-represented functional and biological categories in identified gene sets, L2L and Venn diagrams for comparing multiple lists of genes, and oPOSSUM for identification of statistically over-represented transcription factor binding sites in the promoter regions of gene sets. The work can be a primer for researchers embarking on skinomics, the comprehensive analysis of transcriptional changes in the skin. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
16	WNT5A is regulated by PAX2 and may be involved in blastemal predominant Wilms tumorigenesis. Neoplasia 2009;10:1470-80. [PMID: 19048125 DOI: 10.1593/neo.08442] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Revised: 09/29/2008] [Accepted: 10/06/2008] [Indexed: 01/22/2023] Open Abstract The PAX2 gene encodes a transcription factor expressed during development. In humans, PAX2 mutations cause the renal-coloboma syndrome, whereas homozygous mutations are lethal, causing severe organ malformation, notably in the brain and kidney. Wilms tumor (WT) of the kidney results from a failure in the mesenchymal-epithelial transition, a crucial step partly controlled by PAX2. Downstream target genes regulated by PAX2 are still undefined. We therefore hypothesized that identification and characterization of the genes regulated by PAX2 may improve our understanding of developmentally related malignancies including WT. We used nickel agarose chromatin enrichment, chromatin immunoprecipitation, and the human embryonic kidney-derived cell line HEK293 to identify regulatory elements responding to PAX2. Among others, we identified WNT5A as a gene potentially regulated by PAX2. Here, we demonstrate that WNT5A is a direct target of PAX2 in HEK293 cells, using both transactivation and electrophoretic mobility shift assays. We were unable to find any WNT5A disease-associated mutations after screening a panel of 99 WT samples. However, quantitative reverse transcription-polymerase chain reaction in human favorable-histology WT revealed that approximately 66% of the cases expressed significantly less WNT5A than human fetal kidney. Moreover, the WiT9 WT cell line revealed a weak expression of the WNT5A gene. A correlation of decreased WNT5A expression with predominant blastemal histology tumors suggests a possible inhibitory role in WT pathogenesis. This study underlines the importance of PAX2 in the regulation of WNT5A. Further in vivo study is necessary to determine whether the PAX2 and WNT5A are truly involved in WT pathogenesis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
17	The Bioverse API and web application. Methods Mol Biol 2009;541:511-34. [PMID: 19381533 DOI: 10.1007/978-1-59745-243-4_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract The Bioverse is a framework for creating, warehousing and presenting biological information based on hierarchical levels of organisation. The framework is guided by a deeper philosophy of desiring to represent all relationships between all components of biological systems towards the goal of a wholistic picture of organismal biology. Data from various sources are combined into a single repository and a uniform interface is exposed to access it. The power of the approach of the Bioverse is that, due to its inclusive nature, patterns emerge from the acquired data and new predictions are made. The implementation of this repository (beginning with acquisition of source data, processing in a pipeline, and concluding with storage in a relational database) and interfaces to the data contained in it, from a programmatic application interface to a user friendly web application, are discussed. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
18	Transcriptional autoregulatory loops are highly conserved in vertebrate evolution. PLoS One 2008;3:e3210. [PMID: 18791639 PMCID: PMC2527657 DOI: 10.1371/journal.pone.0003210] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Accepted: 08/18/2008] [Indexed: 01/06/2023] Open Abstract Background Feedback loops are the simplest building blocks of transcriptional regulatory networks and therefore their behavior in the course of evolution is of prime interest. Methodology We address the question of enrichment of the number of autoregulatory feedback loops in higher organisms. First, based on predicted autoregulatory binding sites we count the number of autoregulatory loops. We compare it to estimates obtained either by assuming that each (conserved) gene has the same chance to be a target of a given factor or by assuming that each conserved sequence position has an equal chance to be a binding site of the factor. Conclusions We demonstrate that the numbers of putative autoregulatory loops conserved between human and fugu, danio or chicken are significantly higher than expected. Moreover we show, that conserved autoregulatory binding sites cluster close to the factors' starts of transcription. We conclude, that transcriptional autoregulatory feedback loops constitute a core transcriptional network motif and their conservation has been maintained in higher vertebrate organism evolution. Collapse Key Words Collapse MESH Headings Animals Binding Sites Chickens Conserved Sequence Databases, Genetic Evolution, Molecular Gene Expression Regulation Humans Models, Statistical Molecular Sequence Data Probability Takifugu Transcription Factors/metabolism Transcription, Genetic Vertebrates/genetics Zebrafish Collapse Grants Collapse Affiliation(s) Collapse
19	Identifying alternative hyper-splicing signatures in MG-thymoma by exon arrays. PLoS One 2008;3:e2392. [PMID: 18545673 PMCID: PMC2409220 DOI: 10.1371/journal.pone.0002392] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2008] [Accepted: 03/27/2008] [Indexed: 12/21/2022] Open Abstract BACKGROUND The vast majority of human genes (>70%) are alternatively spliced. Although alternative pre-mRNA processing is modified in multiple tumors, alternative hyper-splicing signatures specific to particular tumor types are still lacking. Here, we report the use of Affymetrix Human Exon Arrays to spot hyper-splicing events characteristic of myasthenia gravis (MG)-thymoma, thymic tumors which develop in patients with MG and discriminate them from colon cancer changes. METHODOLOGY/PRINCIPAL FINDINGS We combined GO term to parent threshold-based and threshold-independent ad-hoc functional statistics with in-depth analysis of key modified transcripts to highlight various exon-specific changes. These denote alternative splicing in MG-thymoma tumors compared to healthy human thymus and to in-house and Affymetrix datasets from colon cancer and healthy tissues. By using both global and specific, term-to-parent Gene Ontology (GO) statistical comparisons, our functional integrative ad-hoc method allowed the detection of disease-relevant splicing events. CONCLUSIONS/SIGNIFICANCE Hyper-spliced transcripts spanned several categories, including the tumorogenic ERBB4 tyrosine kinase receptor and the connective tissue growth factor CTGF, as well as the immune function-related histocompatibility gene HLA-DRB1 and interleukin (IL)19, two muscle-specific collagens and one myosin heavy chain gene; intriguingly, a putative new exon was discovered in the MG-involved acetylcholinesterase ACHE gene. Corresponding changes in spliceosome composition were indicated by co-decreases in the splicing factors ASF/SF(2) and SC35. Parallel tumor-associated changes occurred in colon cancer as well, but the majority of the apparent hyper-splicing events were particular to MG-thymoma and could be validated by Fluorescent In-Situ Hybridization (FISH), Reverse Transcription-Polymerase Chain Reaction (RT-PCR) and mass spectrometry (MS) followed by peptide sequencing. Our findings demonstrate a particular alternative hyper-splicing signature for transcripts over-expressed in MG-thymoma, supporting the hypothesis that alternative hyper-splicing contributes to shaping the biological functions of these and other specialized tumors and opening new venues for the development of diagnosis and treatment approaches. Collapse Key Words Collapse MESH Headings Alternative Splicing Colonic Neoplasms/genetics Exons Humans In Situ Hybridization, Fluorescence Proteomics RNA, Messenger/genetics Reverse Transcriptase Polymerase Chain Reaction Thymus Neoplasms/genetics Collapse Grants Collapse Affiliation(s) Collapse
20	The genomic analysis of erythrocyte microRNA expression in sickle cell diseases. PLoS One 2008;3:e2360. [PMID: 18523662 PMCID: PMC2408759 DOI: 10.1371/journal.pone.0002360] [Citation(s) in RCA: 131] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2007] [Accepted: 04/24/2008] [Indexed: 01/08/2023] Open Abstract Background Since mature erythrocytes are terminally differentiated cells without nuclei and organelles, it is commonly thought that they do not contain nucleic acids. In this study, we have re-examined this issue by analyzing the transcriptome of a purified population of human mature erythrocytes from individuals with normal hemoglobin (HbAA) and homozygous sickle cell disease (HbSS). Methods and Findings Using a combination of microarray analysis, real-time RT-PCR and Northern blots, we found that mature erythrocytes, while lacking ribosomal and large-sized RNAs, contain abundant and diverse microRNAs. MicroRNA expression of erythrocytes was different from that of reticulocytes and leukocytes, and contributed the majority of the microRNA expression in whole blood. When we used microRNA microarrays to analyze erythrocytes from HbAA and HbSS individuals, we noted a dramatic difference in their microRNA expression pattern. We found that miR-320 played an important role for the down-regulation of its target gene, CD71 during reticulocyte terminal differentiation. Further investigation revealed that poor expression of miR-320 in HbSS cells was associated with their defective downregulation CD71 during terminal differentiation. Conclusions In summary, we have discovered significant microRNA expression in human mature erythrocytes, which is dramatically altered in HbSS erythrocytes and their defect in terminal differentiation. Thus, the global analysis of microRNA expression in circulating erythrocytes can provide mechanistic insights into the disease phenotypes of erythrocyte diseases. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
21	Using the Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. ACTA ACUST UNITED AC 2008;Chapter 1:Unit 1.11. [PMID: 18428741 DOI: 10.1002/0471250953.bi0111s9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene expression, mutant phenotypes, biological materials such as DNA and seed stocks, genetic markers, genetic and physical maps, biochemical pathways, genome organization, images of mutant plants and protein sub-cellular localizations, publications, and the research community Data in TAIR are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes. Gene expression data from microarrays is a recent addition to the database and methods for accessing these data are also described. Two pattern identification programs are described for mining TAIR's unique Arabidopsis sequence data sets. We also describe how to use AraCyc for mining plant metabolic pathways. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
22	Web-based resources for clinical bioinformatics. METHODS IN MOLECULAR MEDICINE 2008;141:309-329. [PMID: 18453097 DOI: 10.1007/978-1-60327-148-6_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Abstract In the post-Human Genome Project era, awareness of the resources available through the internet is essential to both molecular biologists and clinicians. An overview of the main databases and analytical tools described in this chapter is important to understand the principles upon which hypotheses are generated, experiments are based and conclusions reached. Similarly, an introduction to the terminology of these resources often facilitates their use and adoption into practice. This chapter covers database resources such as NCBI/ Entrez, Ensembl and UCSC as well as analytical tools for sequence alignment, promoter analysis and molecular interactions. Collapse Key Words Collapse MESH Headings Base Sequence Biomedical Research/methods Computational Biology Databases, Genetic/supply & distribution Genome, Human Humans Internet Polymorphism, Single Nucleotide Promoter Regions, Genetic Protein Binding Sequence Analysis, DNA Sequence Analysis, Protein Sequence Analysis, RNA Software Collapse Grants Collapse Affiliation(s) Collapse
23	Bioinformatics detection of alternative splicing. Methods Mol Biol 2008;452:179-97. [PMID: 18566765 DOI: 10.1007/978-1-60327-159-2_9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Abstract In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
24	Gene Set Expression Comparison kit for BRB-ArrayTools. Bioinformatics 2007;24:137-9. [DOI: 10.1093/bioinformatics/btm541] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
25	Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells. PLoS Genet 2007;3:e145. [PMID: 17784790 PMCID: PMC1959362 DOI: 10.1371/journal.pgen.0030145] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2007] [Accepted: 07/10/2007] [Indexed: 01/06/2023] Open Abstract Understanding the transcriptional regulation of pluripotent cells is of fundamental interest and will greatly inform efforts aimed at directing differentiation of embryonic stem (ES) cells or reprogramming somatic cells. We first analyzed the transcriptional profiles of mouse ES cells and primordial germ cells and identified genes upregulated in pluripotent cells both in vitro and in vivo. These genes are enriched for roles in transcription, chromatin remodeling, cell cycle, and DNA repair. We developed a novel computational algorithm, CompMoby, which combines analyses of sequences both aligned and non-aligned between different genomes with a probabilistic segmentation model to systematically predict short DNA motifs that regulate gene expression. CompMoby was used to identify conserved overrepresented motifs in genes upregulated in pluripotent cells. We show that the motifs are preferentially active in undifferentiated mouse ES and embryonic germ cells in a sequence-specific manner, and that they can act as enhancers in the context of an endogenous promoter. Importantly, the activity of the motifs is conserved in human ES cells. We further show that the transcription factor NF-Y specifically binds to one of the motifs, is differentially expressed during ES cell differentiation, and is required for ES cell proliferation. This study provides novel insights into the transcriptional regulatory networks of pluripotent cells. Our results suggest that this systematic approach can be broadly applied to understanding transcriptional networks in mammalian species. Embryonic stem cells have two remarkable properties: they can proliferate very rapidly, and they can give rise to all of the body's cell types. Understanding how gene activity is regulated in embryonic stem cells will be an important step towards therapeutic applications. The activity of genes is regulated by proteins called transcription factors, which bind to stretches of DNA sequences that act as on or off switches. We identified genes that are active in mouse embryonic stem cells but not in differentiated cells. We reasoned that if these genes have similar patterns of activity, they may be regulated by the same transcription factors. We therefore developed a computational approach that takes information on gene activity and predicts DNA sequences that may act as switches. Using this approach, we discovered new DNA switches that regulate gene activity in mouse and human embryonic stem cells. Furthermore, we identified a transcription factor that binds to one of these DNA switches and is important for the rapid proliferation of embryonic stem cells. Our approach sheds light on the genetic regulation of embryonic stem cells and will be broadly applicable to questions of how gene activity is regulated in other cell types of interest. Collapse Key Words Collapse MESH Headings Animals CCAAT-Binding Factor/genetics CCAAT-Binding Factor/physiology Cell Line Cell Proliferation Cells, Cultured Computational Biology/methods Embryonic Stem Cells/chemistry Embryonic Stem Cells/cytology Embryonic Stem Cells/metabolism Embryonic Stem Cells/physiology Humans Mice Mice, Transgenic Multigene Family NIH 3T3 Cells Oligonucleotide Array Sequence Analysis Regulatory Sequences, Nucleic Acid Collapse Grants R01 GM070808 NIGMS NIH HHS GM070808 NIGMS NIH HHS Collapse Affiliation(s) Collapse
26	The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 2007;5:e16. [PMID: 17355171 PMCID: PMC1821046 DOI: 10.1371/journal.pbio.0050016] [Citation(s) in RCA: 667] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 08/15/2006] [Indexed: 02/04/2023] Open Abstract Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature. The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature. The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature. Collapse Key Words Collapse MESH Headings Expressed Sequence Tags Oceans and Seas Proteins/chemistry Proteins/genetics Water Microbiology Collapse Grants T32 HG000047 NHGRI NIH HHS R01 GM073109 NIGMS NIH HHS 5U54 RR020843-02 NCRR NIH HHS 5T32 HG00047 NHGRI NIH HHS K22 HG000056 NHGRI NIH HHS K22 HG00056 NHGRI NIH HHS P20 GM068136 NIGMS NIH HHS U54 RR020843 NCRR NIH HHS P30 CA014195 NCI NIH HHS Collapse Affiliation(s) Collapse
27	Using galaxy to perform large-scale interactive data analyses. CURRENT PROTOCOLS IN BIOINFORMATICS 2007;Chapter 10:Unit 10.5. [PMID: 18428782 PMCID: PMC3418382 DOI: 10.1002/0471250953.bi1005s19] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Abstract While most experimental biologists know where to download genomic data, few have a concrete plan on how to analyze it. This situation can be corrected by: (1) providing unified portals serving genomic data and (2) building Web applications to allow flexible retrieval and on-the-fly analyses of the data. Powerful resources, such as the UCSC Genome Browser already address the first issue. The second issue, however, remains open. For example, how to find human protein-coding exons with the highest density of single nucleotide polymorphisms (SNPs) and extract orthologous sequences from all sequenced mammals? Indeed, one can access all relevant data from the UCSC Genome Browser. But once the data is downloaded how would one deal with millions of SNPs and gigabytes of alignments? Galaxy (http://g2.bx.psu.edu) is designed specifically for that purpose. It amplifies the strengths of existing resources (such as UCSC Genome Browser) by allowing the user to access and, most importantly, analyze data within a single interface in an unprecedented number of ways. Collapse Key Words comparative genomics genomic alignments web application genome variation Collapse MESH Headings Algorithms Base Sequence Chromosome Mapping/methods Computer Graphics DNA/genetics DNA Mutational Analysis/methods Molecular Sequence Data Sequence Alignment/methods Sequence Analysis, DNA/methods Software User-Computer Interface Collapse Grants R01 GM072264 NIGMS NIH HHS R01 HG004909 NHGRI NIH HHS R21 HG005133 NHGRI NIH HHS RC2 HG005542 NHGRI NIH HHS Collapse Affiliation(s) Collapse
28	Rapid birth-death evolution specific to xenobiotic cytochrome P450 genes in vertebrates. PLoS Genet 2007;3:e67. [PMID: 17500592 PMCID: PMC1866355 DOI: 10.1371/journal.pgen.0030067] [Citation(s) in RCA: 145] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2006] [Accepted: 03/14/2007] [Indexed: 12/15/2022] Open Abstract Genes vary greatly in their long-term phylogenetic stability and there exists no general explanation for these differences. The cytochrome P450 (CYP450) gene superfamily is well suited to investigating this problem because it is large and well studied, and it includes both stable and unstable genes. CYP450 genes encode oxidase enzymes that function in metabolism of endogenous small molecules and in detoxification of xenobiotic compounds. Both types of enzymes have been intensively studied. My analysis of ten nearly complete vertebrate genomes indicates that each genome contains 50-80 CYP450 genes, which are about evenly divided between phylogenetically stable and unstable genes. The stable genes are characterized by few or no gene duplications or losses in species ranging from bony fish to mammals, whereas unstable genes are characterized by frequent gene duplications and losses (birth-death evolution) even among closely related species. All of the CYP450 genes that encode enzymes with known endogenous substrates are phylogenetically stable. In contrast, most of the unstable genes encode enzymes that function as xenobiotic detoxifiers. Nearly all unstable CYP450 genes in the mouse and human genomes reside in a few dense gene clusters, forming unstable gene islands that arose by recurrent local gene duplication. Evidence for positive selection in amino acid sequence is restricted to these unstable CYP450 genes, and sites of selection are associated with substrate-binding regions in the protein structure. These results can be explained by a general model in which phylogenetically stable genes have core functions in development and physiology, whereas unstable genes have accessory functions associated with unstable environmental interactions such as toxin and pathogen exposure. Unstable gene islands in vertebrates share some functional properties with bacterial genomic islands, though they arise by local gene duplication rather than horizontal gene transfer. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Cluster Analysis Cytochrome P-450 Enzyme System/chemistry Cytochrome P-450 Enzyme System/genetics Evolution, Molecular Gene Duplication Genomic Instability Humans Likelihood Functions Macaca/genetics Molecular Sequence Data Phylogeny Selection, Genetic Synteny Vertebrates/genetics Xenobiotics/metabolism Collapse Grants Collapse Affiliation(s) Collapse
29	A partially supervised classification approach to dominant and recessive human disease gene prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2007;85:229-37. [PMID: 17258838 DOI: 10.1016/j.cmpb.2006.12.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Revised: 11/30/2006] [Accepted: 12/08/2006] [Indexed: 05/13/2023] Abstract The discovery of the genes involved in genetic diseases is a very important step towards the understanding of the nature of these diseases. In-lab identification is a difficult, time-consuming task, where computational methods can be very useful. In silico identification algorithms can be used as a guide in future studies. Previous works in this topic have not taken into account that no reliable sets of negative examples are available, as it is not possible to ensure that a given gene is not related to any genetic disease. In this paper, this feature of the nature of the problem is considered, and identification is approached as a partially supervised classification problem. In addition, we have performed a more specific method to identify disease genes by classifying, for the first time, genes causing dominant and recessive diseases independently. We base this separation on previous results that show that these two types of genes present differences in their sequence properties. In this paper, we have applied a new model averaging algorithm to the identification of human genes associated with both dominant and recessive Mendelian diseases. Collapse Key Words Collapse MESH Headings Algorithms Genes, Dominant/genetics Genes, Recessive/genetics Genetic Predisposition to Disease/classification Humans Sequence Analysis, DNA Spain Collapse Grants Collapse Affiliation(s) Collapse
30	Novel candidate targets of Wnt/beta-catenin signaling in hepatoma cells. Life Sci 2006;80:690-8. [PMID: 17157329 DOI: 10.1016/j.lfs.2006.10.024] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Revised: 09/21/2006] [Accepted: 10/26/2006] [Indexed: 02/06/2023] Abstract The activity of beta-catenin/TCF, the key component of Wnt signaling pathway, is frequently deregulated in HCC, resulting in the activation of genes whose dysregulation has significant consequences on tumor development. Therefore, identifying the target genes of Wnt signaling is important for understanding beta-catenin-mediated carcinogenesis. We analyzed the transcriptome profile of human hepatoma cell lines using cDNA microarrays representing 15,127 unique, liver-enriched gene loci to identify the target genes of beta-catenin-mediated transcription (p<0.005). This analysis yielded 130 potential Wnt-associated classifier genes, and we found 33 of them contain consensus TCF-binding sites in presumptive transcriptional regulatory sequences. These genes were, then, tested for their Wnt-dependence of expression in experimental models of Wnt activation. Genes such as RPL29, NEDD4L, FUT8, LYZ, STMN2, STARD7 and KIAA0998 were proven to be up-regulated upon Wnt/beta-catenin activation. Gene ontology analysis of the 33 candidate genes indicated the presence of functional categories relevant to Wnt pathway such as cell growth, proliferation, adhesion and signal transduction. In conclusion, we identified a number of candidate Wnt/beta-catenin target genes that can be useful for studying the role of altered Wnt signaling in liver cancer development, and showed that some of them might be direct targets of Wnt signaling in hepatoma cells. Collapse Key Words Collapse MESH Headings Carcinoma, Hepatocellular/genetics Carcinoma, Hepatocellular/metabolism Cell Line, Tumor Gene Expression Profiling Gene Expression Regulation, Neoplastic Humans Liver Neoplasms/genetics Liver Neoplasms/metabolism Oligonucleotide Array Sequence Analysis Signal Transduction Wnt Proteins/genetics Wnt Proteins/metabolism beta Catenin/genetics beta Catenin/metabolism Collapse Grants Collapse Affiliation(s) Collapse
31	Prediction of the deleterious nsSNPs in ABCB transporters. FEBS Lett 2006;580:6800-6. [PMID: 17141228 DOI: 10.1016/j.febslet.2006.11.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Revised: 11/02/2006] [Accepted: 11/14/2006] [Indexed: 01/11/2023] Abstract The non-synonymous SNPs (nsSNPs) in coding regions, neutral or deleterious, could lead to the alteration of the function or structure of proteins. We have developed the computational models to analyze the deleterious nsSNPs in the transporters and predict ones in ABCB (ATP-binding cassette B) transporters of interest. The RPLS (ridge partial least square) and LDA (linear discriminant analysis) methods were applied to the problem, by training on a selection of datasets from a specified source, i.e., human transporters. The best combination of datasets and prediction attributes was ascertained. The prediction accuracy of the theoretical RPLS model for the training and testing sets is 84.8% and 80.4%, respectively (LDA: 84.3% and 80.4%), which indicates the models are reasonable and may be helpful for pharmacogenetics studies. Collapse Key Words Collapse MESH Headings ATP-Binding Cassette Transporters/classification ATP-Binding Cassette Transporters/genetics ATP-Binding Cassette Transporters/metabolism Amino Acids/genetics Amino Acids/metabolism Computational Biology Humans Models, Biological Polymorphism, Single Nucleotide/genetics Collapse Grants Collapse Affiliation(s) Collapse
32	NATsDB: Natural Antisense Transcripts DataBase. Nucleic Acids Res 2006;35:D156-61. [PMID: 17082204 PMCID: PMC1635336 DOI: 10.1093/nar/gkl782] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open Abstract Natural antisense transcripts (NATs) are reverse complementary at least in part to the sequences of other endogenous sense transcripts. Most NATs are transcribed from opposite strands of their sense partners. They regulate sense genes at multiple levels and are implicated in various diseases. Using an improved whole-genome computational pipeline, we identified abundant cis-encoded exon-overlapping sense-antisense (SA) gene pairs in human (7356), mouse (6806), fly (1554), and eight other eukaryotic species (total 6534). We developed NATsDB (Natural Antisense Transcripts DataBase, http://natsdb.cbi.pku.edu.cn/) to enable efficient browsing, searching and downloading of this currently most comprehensive collection of SA genes, grouped into six classes based on their overlapping patterns. NATsDB also includes non-exon-overlapping bidirectional (NOB) genes and non-bidirectional (NBD) genes. To facilitate the study of functions, regulations and possible pathological implications, NATsDB includes extensive information about gene structures, poly(A) signals and tails, phastCons conservation, homologues in other species, repeat elements, expressed sequence tag (EST) expression profiles and OMIM disease association. NATsDB supports interactive graphical display of the alignment of all supporting EST and mRNA transcripts of the SA and NOB genes to the genomic loci. It supports advanced search by species, gene name, sequence accession number, chromosome location, coding potential, OMIM association and sequence similarity. Collapse Key Words Collapse MESH Headings Animals Base Sequence Databases, Nucleic Acid Dogs Genes Humans Internet Mice Molecular Sequence Data RNA, Antisense/chemistry RNA, Antisense/genetics RNA, Antisense/metabolism RNA, Messenger/chemistry Rats User-Computer Interface Collapse Grants Intramural NIH HHS Collapse Affiliation(s) Collapse
33	Comparative genomic analysis of human and chimpanzee indicates a key role for indels in primate evolution. J Mol Evol 2006;63:682-90. [PMID: 17075697 DOI: 10.1007/s00239-006-0045-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2006] [Accepted: 04/20/2006] [Indexed: 11/27/2022] Abstract Sequence comparison of humans and chimpanzees is of interest to understand the mechanisms behind primate evolution. Here we present an independent analysis of human chromosome 21 and the high-quality BAC clone sequences of the homologous chimpanzee chromosome 22. In contrast to previous studies, we have used global alignment methods and Ensembl predictions of protein coding genes (n = 224) for the analysis. Divergence due to insertions and deletions (indels) along with substitutions was examined separately for different genomic features (coding, noncoding genic, and intergenic sequence). The major part of the genomic divergence could be attributed to indels (5.07%), while the nucleotide divergence was estimated as 1.52%. Thus the total divergence was estimated as 6.58%. When excluding repeats and low-complexity DNA the total divergence decreased to 2.37%. The chromosomal distribution of nucleotide substitutions and indel events was significantly correlated. To further examine the role of indels in primate evolution we focused on coding sequences. Indels were found within the coding sequence of 13% of the genes and approximately half of the indels have not been reported previously. In 5% of the chimpanzee genes, indels or substitutions caused premature stop codons that rendered the affected transcripts nonfunctional. Taken together, our findings demonstrate that indels comprise the majority of the genomic divergence. Furthermore, indels occur frequently in coding sequences. Our results thereby support the hypothesis that indels may have a key role in primate evolution. Collapse Key Words Collapse MESH Headings Animals Base Sequence Chromosome Mapping DNA, Intergenic/genetics Evolution, Molecular Genetic Variation Genome/genetics Humans Molecular Sequence Data Mutagenesis, Insertional Open Reading Frames/genetics Pan troglodytes Primates/genetics Sequence Alignment Sequence Analysis, DNA Sequence Deletion Collapse Grants Collapse Affiliation(s) Collapse
34	A global genomic transcriptional code associated with CNS-expressed genes. Exp Cell Res 2006;312:3108-19. [PMID: 16919269 DOI: 10.1016/j.yexcr.2006.06.017] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2006] [Revised: 06/05/2006] [Accepted: 06/12/2006] [Indexed: 01/28/2023] Abstract Highly conserved non-coding DNA regions (HCNR) occur frequently in vertebrate genomes, but their functional roles remain unclear. Here, we provide evidence that a large portion of HCNRs are enriched for binding sites for Sox, POU and Homeodomain transcription factors, and such HCNRs can act as cis-regulatory regions active in neural stem cells. Strikingly, these HCNRs are linked to several hundreds of genes expressed in the developing CNS and they may exert locus-wide regulatory effects on multiple genes flanking their genomic location. Moreover, these data imply a unifying transcriptional logic for a large set of CNS-expressed genes in which Sox and POU proteins act as generic promoters of transcription while Homeodomain proteins control the spatial expression of genes through active repression. Collapse Key Words Collapse MESH Headings Animals Base Sequence Binding Sites Body Patterning/genetics Cells, Cultured Central Nervous System/metabolism Chick Embryo Conserved Sequence/genetics Down-Regulation/genetics Genome/genetics Genomics High Mobility Group Proteins/metabolism Homeodomain Proteins/genetics Humans Introns/genetics Mice Molecular Sequence Data Neurons/metabolism POU Domain Factors/metabolism Regulatory Sequences, Nucleic Acid/genetics Tetraodontiformes/genetics Transcription, Genetic/genetics Collapse Grants MC_U137761446 Medical Research Council Collapse Affiliation(s) Collapse
35	Genomes as geography: using GIS technology to build interactive genome feature maps. BMC Bioinformatics 2006;7:416. [PMID: 16984652 PMCID: PMC1599760 DOI: 10.1186/1471-2105-7-416] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2006] [Accepted: 09/19/2006] [Indexed: 11/10/2022] Open Abstract BACKGROUND Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. RESULTS We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. CONCLUSION Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. Collapse Key Words Collapse MESH Headings Animals Chromosome Mapping/methods Databases, Genetic Gene Expression Regulation Genome/genetics Geographic Information Systems Mice Collapse Grants Collapse Affiliation(s) Collapse
36	AgBase: a functional genomics resource for agriculture. BMC Genomics 2006;7:229. [PMID: 16961921 PMCID: PMC1618847 DOI: 10.1186/1471-2164-7-229] [Citation(s) in RCA: 224] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 09/08/2006] [Indexed: 11/16/2022] Open Abstract BACKGROUND Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. DESCRIPTION To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource http://www.agbase.msstate.edu for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO). The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes), return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data. CONCLUSION The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens. We use experimental data to improve structural annotation of genomes and to functionally characterize gene products. AgBase is also directly relevant for researchers in fields as diverse as agricultural production, cancer biology, biopharmaceuticals, human health and evolutionary biology. Moreover, the experimental methods and bioinformatics tools we provide are widely applicable to many other species including model organisms. Collapse Key Words Collapse MESH Headings Agriculture Animals Databases, Genetic Databases, Protein Genome/genetics Genomics Humans Collapse Grants Collapse Affiliation(s) Collapse
37	An ancient transcriptional regulatory linkage. Dev Biol 2006;281:299-308. [PMID: 15893980 DOI: 10.1016/j.ydbio.2005.03.004] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2005] [Revised: 03/08/2005] [Accepted: 03/08/2005] [Indexed: 11/18/2022] Abstract Changes in gene regulatory networks are a major engine for creating developmental novelty during evolution. Conversely, regulatory linkages that survive for very long evolutionary periods might be characteristic of ancient and abstract functions of fundamental utility to all metazoans. The proneural genes, which encode a distinctive family of basic helix-loop-helix (bHLH) transcriptional activators, act to promote neural cell fates in the ectoderm of diverse species. Here we report that these genes have been associated for at least 600-700 million years--since before the cnidarian/bilaterian divergence--with a high-affinity binding site for Hairy/Enhancer of split (Hes) repressor proteins. We suggest that the systematic identification of such ancient and conserved connections will be a powerful means of uncovering the primordial functions of transcription factors and signaling systems. Collapse Key Words Collapse MESH Headings Animals Base Sequence Basic Helix-Loop-Helix Transcription Factors/genetics Basic Helix-Loop-Helix Transcription Factors/metabolism Binding Sites Evolution, Molecular Insect Proteins/genetics Insect Proteins/metabolism Insecta/genetics Molecular Sequence Data Phylogeny Rats Repressor Proteins/genetics Repressor Proteins/metabolism Transcription, Genetic Collapse Grants GM046993 NIGMS NIH HHS Collapse Affiliation(s) Collapse
38	From information to understanding: the role of model organism databases in comparative and functional genomics. Anim Genet 2006;37 Suppl 1:28-40. [PMID: 16887000 DOI: 10.1111/j.1365-2052.2006.01475.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Data integration is key to functional and comparative genomics because integration allows diverse data types to be evaluated in new contexts. To achieve data integration in a scalable and sensible way, semantic standards are needed, both for naming things (standardized nomenclatures, use of key words) and also for knowledge representation. The Mouse Genome Informatics database and other model organism databases help to close the gap between information and understanding of biological processes because these resources enforce well-defined nomenclature and knowledge representation standards. Model organism databases have a critical role to play in ensuring that diverse kinds of data, especially genome-scale data sets and information, remain useful to the biological community in the long-term. The efforts of model organism database groups ensure not only that organism-specific data are integrated, curated and accessible but also that the information is structured in such a way that comparison of biological knowledge across model organisms is facilitated. Collapse Key Words Collapse MESH Headings Animals Computational Biology Databases, Genetic Genomics/methods Humans Mice/genetics Models, Animal Phenotype Terminology as Topic Collapse Grants P41 HG-002273 NHGRI NIH HHS P41 HG00330 NHGRI NIH HHS R01 CA89713 NCI NIH HHS R01 HD33745 NICHD NIH HHS Collapse Affiliation(s) Collapse
39	Prediction of cis-regulatory elements for drug-activated transcription factors in the regulation of drug-metabolising enzymes and drug transporters. Expert Opin Drug Metab Toxicol 2006;2:367-79. [PMID: 16863440 DOI: 10.1517/17425255.2.3.367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract The expression of drug-metabolising enzymes is affected by many endogenous and exogenous factors, including sex, age, diet and exposure to xenobiotics and drugs. To understand fully how the organism metabolises a drug, these alterations in gene expression must be taken into account. The central process, the definition of likely regulatory elements in the genes coding for enzymes and transporters involved in drug disposition, can be vastly accelerated using existing and emerging bioinformatics methods to unravel the regulatory networks causing drug-mediated induction of genes. Here, various approaches to predict transcription factor interactions with regulatory DNA elements are reviewed. Collapse Key Words Collapse MESH Headings Algorithms Animals Binding Sites Carrier Proteins/genetics Carrier Proteins/metabolism Computational Biology/methods Cytochrome P-450 Enzyme System/genetics Cytochrome P-450 Enzyme System/metabolism DNA/genetics DNA/metabolism Databases, Genetic Gene Expression Regulation, Enzymologic/drug effects Humans Markov Chains Models, Genetic Promoter Regions, Genetic/genetics Receptors, Cytoplasmic and Nuclear/metabolism Transcription Factors/metabolism Xenobiotics/metabolism Xenobiotics/pharmacology Collapse Grants Collapse Affiliation(s) Collapse
40	Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res 2006;34:3465-75. [PMID: 16849434 PMCID: PMC1524920 DOI: 10.1093/nar/gkl473] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open Abstract We developed a fast, integrative pipeline to identify cis natural antisense transcripts (cis-NATs) at genome scale. The pipeline mapped mRNAs and ESTs in UniGene to genome sequences in GoldenPath to find overlapping transcripts and combining information from coding sequence, poly(A) signal, poly(A) tail and splicing sites to deduce transcription orientation. We identified cis-NATs in 10 eukaryotic species, including 7830 candidate sense–antisense (SA) genes in 3915 SA pairs in human. The abundance of SA genes is remarkably low in worm and does not seem to be caused by the prevalence of operons. Hundreds of SA pairs are conserved across different species, even maintaining the same overlapping patterns. The convergent SA class is prevalent in fly, worm and sea squirt, but not in human or mouse as reported previously. The percentage of SA genes among imprinted genes in human and mouse is 24–47%, a range between the two previous reports. There is significant shortage of SA genes on Chromosome X in human and mouse but not in fly or worm, supporting X-inactivation in mammals as a possible cause. SA genes are over-represented in the catalytic activities and basic metabolism functions. All candidate cis-NATs can be downloaded from . Collapse Key Words Collapse MESH Headings Animals Base Sequence Cattle Computational Biology/methods Conserved Sequence Genomic Imprinting Genomics/methods Humans Mice Molecular Sequence Data RNA, Antisense/chemistry RNA, Antisense/classification RNA, Antisense/genetics Rats X Chromosome Collapse Grants R01 HG004069 NHGRI NIH HHS R01 HG004069-01 NHGRI NIH HHS Collapse Affiliation(s) Collapse
41	Impact of constitutive IGF1/IGF2 stimulation on the transcriptional program of human breast cancer cells. Carcinogenesis 2006;28:49-59. [PMID: 16774935 DOI: 10.1093/carcin/bgl091] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Insulin-like growth factor (IGF) signaling is a key regulator of breast development and breast cancer. We have analyzed the expression of the IGF signaling cascade in 17 human breast cancer and 4 mammary epithelial cell lines. Five cell lines expressed high levels of IGF1 receptor, insulin (INS)/IGF receptor substrate 1, IGF-binding proteins 2 and 4, as well as the estrogen receptor (ESR), indicating a co-activation of IGF and ESR signaling. Next, we stably overexpressed IGF1 and IGF2 in MCF7 breast cancer cells, which did not affect their epithelial characteristics and the expression and localization of the epithelial marker genes E-cadherin and beta-catenin. Conversely, IGF1 and IGF2 overexpression potently increased cellular proliferation rates and the efficiency of tumor formation in mouse xenograft experiments, whereas the resistance to chemotherapeutic drugs such as taxol was unaltered. Expression profiling of overexpressing cells with whole-genome oligonucleotide microarrays revealed that 21 genes were upregulated >2-fold by both IGF1 and IGF2, 9 by IGF1, and 9 by IGF2. Half of the genes found to be upregulated are involved in transport and biosynthesis of amino acids, including several amino acid transport proteins, argininosuccinate and asparagine synthetases, and methionyl-tRNA synthetase. Upregulation of these genes constitutes a novel mechanism apparently contributing to the stimulatory effects of IGF signaling on the global protein synthesis rate. We conclude that the induction of cell proliferation and tumor formation by long-term IGF stimulation may primarily be due to anabolic effects, in particular increased amino acid production and uptake. Collapse Key Words Collapse MESH Headings Animals Blotting, Western Breast Neoplasms/metabolism Cell Proliferation Cell Survival Drug Resistance, Neoplasm Female Fluorescent Antibody Technique Genome, Human Humans Immunoenzyme Techniques Insulin-Like Growth Factor I/metabolism Insulin-Like Growth Factor II/metabolism Mice Mice, Inbred BALB C Mice, Nude Mice, SCID Oligonucleotide Array Sequence Analysis Paclitaxel/pharmacology Reverse Transcriptase Polymerase Chain Reaction Signal Transduction Thymidine/metabolism Transcription, Genetic Transplantation, Heterologous Tumor Cells, Cultured Collapse Grants Collapse Affiliation(s) Collapse
42	Verification of predicted alternatively spliced Wnt genes reveals two new splice variants (CTNNB1 and LRP5) and altered Axin-1 expression during tumour progression. BMC Genomics 2006;7:148. [PMID: 16772034 PMCID: PMC1523213 DOI: 10.1186/1471-2164-7-148] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2005] [Accepted: 06/13/2006] [Indexed: 11/25/2022] Open Abstract Background Splicing processes might play a major role in carcinogenesis and tumour progression. The Wnt pathway is of crucial relevance for cancer progression. Therefore we focussed on the Wnt/β-catenin signalling pathway in order to validate the expression of sequences predicted as alternatively spliced by bioinformatic methods. Splice variants of its key molecules were selected, which may be critical components for the understanding of colorectal tumour progression and may have the potential to act as biological markers. For some of the Wnt pathway genes the existence of splice variants was either proposed (e.g. β-Catenin and CTNNB1) or described only in non-colon tissues (e.g. GSK3β) or hitherto not published (e.g. LRP5). Results Both splice variants – normal and alternative form – of all selected Wnt pathway components were found to be expressed in cell lines as well as in samples derived from tumour, normal and healthy tissues. All splice positions corresponded totally with the bioinformatical prediction as shown by sequencing. Two hitherto not described alternative splice forms (CTNNB1 and LRP5) were detected. Although the underlying EST data used for the bioinformatic analysis suggested a tumour-specific expression neither a qualitative nor a significant quantitative difference between the expression in tumour and healthy tissues was detected. Axin-1 expression was reduced in later stages and in samples from carcinomas forming distant metastases. Conclusion We were first to describe that splice forms of crucial genes of the Wnt-pathway are expressed in human colorectal tissue. Newly described splicefoms were found for β-Catenin, LRP5, GSK3β, Axin-1 and CtBP1. However, the predicted cancer specificity suggested by the origin of the underlying ESTs was neither qualitatively nor significant quantitatively confirmed. That let us to conclude that EST sequence data can give adequate hints for the existence of alternative splicing in tumour tissues. That no difference in the expression of these splice forms between cancerous tissues and normal mucosa was found, may indicate that the existence of different splice forms is of less significance for cancer formation as suggested by the available EST data. The currently available EST source is still insufficient to clearly deduce colon cancer specificity. More EST data from colon (tumour and healthy) is required to make reliable predictions. Collapse Key Words Collapse MESH Headings Aged Alcohol Oxidoreductases/genetics Alternative Splicing/genetics Axin Protein DNA, Complementary/genetics DNA-Binding Proteins/genetics Down-Regulation/genetics Female Gene Expression Regulation, Neoplastic Glycogen Synthase Kinase 3/genetics Glycogen Synthase Kinase 3 beta Humans LDL-Receptor Related Proteins/genetics Low Density Lipoprotein Receptor-Related Protein-5 Male Middle Aged Neoplasm Metastasis/genetics RNA, Messenger/genetics RNA, Messenger/metabolism Repressor Proteins/genetics Reproducibility of Results Tumor Cells, Cultured Up-Regulation/genetics Wnt Proteins/genetics beta Catenin/genetics Collapse Grants Collapse Affiliation(s) Collapse
43	Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol 2006;23:1715-23. [PMID: 16757654 DOI: 10.1093/molbev/msl034] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open Abstract Gene order is not random with regard to gene expression in mammals: coexpressed genes, and in particular housekeeping genes, are clustered along chromosomes more often than expected by chance. To understand the origin of these clusters and to quantify the impact of this phenomenon on genome organization, we analyzed clusters of coexpressed genes in the human and mouse genomes. We show that neighboring genes experience continuous concerted expression changes during evolution, which leads to the formation of coexpressed gene clusters. The pattern of expression within these clusters evolves more slowly than the genomic average. Moreover, by studying gene order evolution, we show that some clusters are maintained by natural selection and, therefore, have a functional significance. However, we also demonstrate that some coexpressed gene clusters are the result of neutral coevolution effects, as illustrated by the clustering of genes escaping inactivation on the X chromosome. Moreover, we show that, although statistically significant, constraints on gene orders have a limited impact on mammalian genome organization, affecting only 3-5% of the pool of human and murine genes. It had been hypothesized that coexpressed gene clusters might correspond to large chromatin domains. In contradiction, we find that most of these clusters contain only 2 genes whose coexpression may be due to transcriptional read-through or the activity of bidirectional promoters. Collapse Key Words Collapse MESH Headings Animals Chromosomes, Human, X Chromosomes, Human, Y Cluster Analysis Databases, Protein Evolution, Molecular Gene Expression Regulation Gene Order Genome Genome, Human Humans Mice Multigene Family Selection, Genetic Transcription, Genetic Collapse Grants Collapse Affiliation(s) Collapse
44	A summary statistic approach to sequence variation in noncoding regions of six schizophrenia-associated gene loci. Eur J Hum Genet 2006;14:1037-43. [PMID: 16736033 DOI: 10.1038/sj.ejhg.5201664] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open Abstract In order to explore the role of noncoding variants in the genetics of schizophrenia, we sequenced 27 kb of noncoding DNA from the gene loci RAC-alpha serine/threonine-protein kinase (AKT1), brain-derived neurotrophic factor (BDNF), dopamine receptor-3 (DRD3), dystrobrevin binding protein-1 (DTNBP1), neuregulin-1 (NRG1) and regulator of G-protein signaling-4 (RGS4) in 37 schizophrenia patients and 25 healthy controls. To compare the allele frequency spectrum between the two samples, we separately computed Tajima's D-value for each sample. The results showed a smaller Tajima's D-value in the case sample, pointing to an excess of rare variants as compared to the control sample. When randomly permuting the affection status of sequenced individuals, we observed a stronger decrease of Tajima's D in 2400 out of 100,000 permutations, corresponding to a P-value of 0.024 in a one-sided test. Thus, rare variants are significantly enriched in the schizophrenia sample, indicating the existence of disease-related sequence alterations. When categorizing the sequenced fragments according to their level of human-rodent conservation or according to their gene locus, we observed a wide range of diversity parameter estimates. Rare variants were enriched in conserved regions as compared to nonconserved regions in both samples. Nevertheless, rare variants remained more common among patients, suggesting an increased number of variants under purifying selection in this sample. Finally, we performed a heuristic search for the subset of gene loci, which jointly produces the strongest difference between controls and cases. This showed a more prominent role of variants from the loci AKT1, BDNF and RGS4. Taken together, our approach provides promising strategy to investigate the genetics of schizophrenia and related phenotypes. Collapse Key Words Collapse MESH Headings Brain-Derived Neurotrophic Factor/genetics Case-Control Studies DNA, Intergenic Data Interpretation, Statistical Genetic Variation Humans Polymorphism, Single Nucleotide Proto-Oncogene Proteins c-akt/genetics RGS Proteins/genetics Schizophrenia/genetics Collapse Grants Collapse Affiliation(s) Collapse
45	STMN2 is a novel target of beta-catenin/TCF-mediated transcription in human hepatoma cells. Biochem Biophys Res Commun 2006;345:1059-67. [PMID: 16712787 DOI: 10.1016/j.bbrc.2006.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2006] [Accepted: 05/04/2006] [Indexed: 02/06/2023] Abstract The activity of beta-catenin/TCF, the key component of Wnt signaling pathway, is frequently deregulated in human cancers, resulting in the activation of genes whose dysregulation has significant consequences on tumor development. Therefore, identifying the target genes of Wnt signaling is important for understanding beta-catenin-mediated carcinogenesis. Here, we report STMN2, a gene implicated in the regulation of microtubule dynamics, as a novel target of beta-catenin-mediated transcription. STMN2 was up-regulated in hepatoma and cirrhotic liver tissues compared to normal liver and also in cell lines where beta-catenin/TCF is constitutively activated. Transient activation of beta-catenin/TCF either by transfection of a constitutively active form of beta-catenin or by LiCl treatment induced the STMN2 mRNA expression in PLC/PRF/5 cells. Of the four members of STMN gene family, only STMN2 showed a Wnt-dependent expression pattern. Through promoter mapping and chromatin immunoprecipitation assays, we found that STMN2 is a direct target of beta-catenin/TCF-mediated transcription and that the TCF binding site at -1713 of STMN2 promoter is critical for beta-catenin/TCF-dependent expression regulation. siRNA-mediated knock-down of STMN2 expression indicated that STMN2 is required for maintaining the anchorage-independent growth state of beta-catenin/TCF-activated hepatoma cells. Our results suggest that STMN2 might be a novel player of beta-catenin/TCF-mediated carcinogenesis in the liver. Collapse Key Words Collapse MESH Headings Carcinoma, Hepatocellular/metabolism Cell Line, Tumor Cell Proliferation Gene Expression Regulation, Neoplastic Humans Liver/metabolism Membrane Proteins Microtubules/metabolism Nerve Growth Factors/metabolism RNA Interference RNA, Messenger/metabolism Stathmin T Cell Transcription Factor 1/metabolism Transcription, Genetic Up-Regulation beta Catenin/metabolism Collapse Grants Collapse Affiliation(s) Collapse
46	Gene profiling and bioinformatic analysis of Schwann cell embryonic development and myelination. Glia 2006;53:501-15. [PMID: 16369933 DOI: 10.1002/glia.20309] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Abstract To elucidate the molecular mechanisms involved in Schwann cell development, we profiled gene expression in the developing and injured rat sciatic nerve. The genes that showed significant changes in expression in developing and dedifferentiated nerve were validated with RT-PCR, in situ hybridisation, Western blot and immunofluorescence. A comprehensive approach to annotating micro-array probes and their associated transcripts was performed using Biopendium, a database of sequence and structural annotation. This approach significantly increased the number of genes for which a functional insight could be found. The analysis implicates agrin and two members of the collapsin response-mediated protein (CRMP) family in the switch from precursors to Schwann cells, and synuclein-1 and alphaB-crystallin in peripheral nerve myelination. We also identified a group of genes typically related to chondrogenesis and cartilage/bone development, including type II collagen, that were expressed in a manner similar to that of myelin-associated genes. The comprehensive function annotation also identified, among the genes regulated during nerve development or after nerve injury, proteins belonging to high-interest families, such as cytokines and kinases, and should therefore provide a uniquely valuable resource for future research. Collapse Key Words Collapse MESH Headings Agrin/biosynthesis Agrin/genetics Animals Blotting, Western Bucladesine/pharmacology Cells, Cultured Computational Biology Cytokines/biosynthesis Cytokines/genetics Embryonic Development/physiology Female Flow Cytometry Fluorescent Antibody Technique Gene Expression Profiling In Situ Hybridization Interleukin-8/biosynthesis Interleukin-8/genetics Myelin Sheath/physiology Nerve Tissue Proteins/biosynthesis Nerve Tissue Proteins/genetics Neural Crest/cytology Neural Crest/embryology Phosphoproteins/biosynthesis Phosphoproteins/genetics Pregnancy Rats Rats, Sprague-Dawley Reverse Transcriptase Polymerase Chain Reaction Schwann Cells/physiology Sciatic Neuropathy/pathology alpha-Crystallin B Chain/biosynthesis alpha-Crystallin B Chain/genetics alpha-Synuclein/biosynthesis alpha-Synuclein/genetics Collapse Grants Wellcome Trust Collapse Affiliation(s) Collapse
47	Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol 2006;2:e15. [PMID: 16518452 PMCID: PMC1386723 DOI: 10.1371/journal.pcbi.0020015] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2005] [Accepted: 01/18/2006] [Indexed: 01/02/2023] Open Abstract We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans) together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales—from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for “Comparative Genomics Library”). Our results demonstrate that change in intron–exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities. Just as protein sequences change over time, so do gene structures. Over comparatively short evolutionary timescales, introns lengthen and shorten; and over longer timescales the number and positions of introns in homologous genes can change. These facts suggest that the intron–exon structures of genes may provide a source of evolutionary information. The utility of gene structures as materials for phylogenetic analyses, however, depends upon their independence from the forces driving protein evolution. If, for example, intron–exon structures are strongly influenced by selection at the amino acid level, then using them for phylogenetic investigations is largely pointless, as the same information could have been more easily gained from protein analyses. Using 11 animal genomes, Yandell et al. show that evolution of intron lengths and positions is largely—though not completely—independent of protein sequence evolution. This means that gene structures provide a source of information about the evolutionary past independent of protein sequence similarities—a finding the authors employ to investigate the accuracy of the protein clock and to explore the utility of gene structures as a means to resolve deep phylogenetic relationships within the animals. Collapse Key Words Collapse MESH Headings Animals Anopheles/genetics Caenorhabditis elegans/genetics Ciona intestinalis/genetics Computational Biology/methods Drosophila melanogaster/genetics Evolution, Molecular Genome Humans Mice Phylogeny Proteomics/methods Species Specificity Collapse Grants P41 HG000739 NHGRI NIH HHS U41 HG000739 NHGRI NIH HHS HG00739 NHGRI NIH HHS HG00750 NHGRI NIH HHS Collapse Affiliation(s) Collapse
48	NOPdb: Nucleolar Proteome Database. Nucleic Acids Res 2006;34:D218-20. [PMID: 16381850 PMCID: PMC1347367 DOI: 10.1093/nar/gkj004] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open Abstract The Nucleolar Proteome Database (NOPdb) archives data on >700 proteins that were identified by multiple mass spectrometry (MS) analyses from highly purified preparations of human nucleoli, the most prominent nuclear organelle. Each protein entry is annotated with information about its corresponding gene, its domain structures and relevant protein homologues across species, as well as documenting its MS identification history including all the peptides sequenced by tandem MS/MS. Moreover, data showing the quantitative changes in the relative levels of ∼500 nucleolar proteins are compared at different timepoints upon transcriptional inhibition. Correlating changes in protein abundance at multiple timepoints, highlighted by visualization means in the NOPdb, provides clues regarding the potential interactions and relationships between nucleolar proteins and thereby suggests putative functions for factors within the 30% of the proteome which comprises novel/uncharacterized proteins. The NOPdb () is searchable by either gene names, nucleotide or protein sequences, Gene Ontology terms or motifs, or by limiting the range for isoelectric points and/or molecular weights and links to other databases (e.g. LocusLink, OMIM and PubMed). Collapse Key Words Collapse MESH Headings Cell Nucleolus/metabolism Databases, Protein Humans Internet Nuclear Proteins/chemistry Nuclear Proteins/genetics Nuclear Proteins/physiology Proteome/chemistry Proteome/genetics Proteome/physiology User-Computer Interface Collapse Grants 073980 Wellcome Trust Collapse Affiliation(s) Collapse
49	Djinn Lite: a tool for customised gene transcript modelling, annotation-data enrichment and exploration. BMC Bioinformatics 2006;7:33. [PMID: 16426464 PMCID: PMC1397871 DOI: 10.1186/1471-2105-7-33] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2005] [Accepted: 01/23/2006] [Indexed: 11/10/2022] Open Abstract BACKGROUND There is an ever increasing rate of data made available on genetic variation, transcriptomes and proteomes. Similarly, a growing variety of bioinformatic programs are becoming available from many diverse sources, designed to identify a myriad of sequence patterns considered to have potential biological importance within inter-genic regions, genes, transcripts, and proteins. However, biologists require easy to use, uncomplicated tools to integrate this information, visualise and print gene annotations. Integrating this information usually requires considerable informatics skills, and comprehensive knowledge of the data format to make full use of this information. Tools are needed to explore gene model variants by allowing users the ability to create alternative transcript models using novel combinations of exons not necessarily represented in current database deposits of mRNA/cDNA sequences. RESULTS Djinn Lite is designed to be an intuitive program for storing and visually exploring of custom annotations relating to a eukaryotic gene sequence and its modelled gene products. In particular, it is helpful in developing hypothesis regarding alternate splicing of transcripts by allowing the construction of model transcripts and inspection of their resulting translations. It facilitates the ability to view a gene and its gene products in one synchronised graphical view, allowing one to drill down into sequence related data. Colour highlighting of selected sequences and added annotations further supports exploration, visualisation of sequence regions and motifs known or predicted to be biologically significant. CONCLUSION Gene annotating remains an ongoing and challenging task that will continue as gene structures, gene transcription repertoires, disease loci, protein products and their interactions become more precisely defined. Djinn Lite offers an accessible interface to help accumulate, enrich, and individualize sequence annotations relating to a gene, its transcripts and translations. The mechanism of transcript definition and creation, and subsequent navigation and exploration of features, are very intuitive and demand only a short learning curve. Ultimately, Djinn Lite can form the basis for providing valuable clues to plan new experiments, providing storage of sequences and annotations for dedication to customised projects. The application is appropriate for Windows 98-ME-2000-XP-2003 operating systems. Collapse Key Words Collapse MESH Headings Alternative Splicing Animals Base Sequence Computational Biology/methods Computer Graphics DNA, Complementary/metabolism Data Interpretation, Statistical Database Management Systems Databases, Genetic Databases, Protein Exons Genome Humans Introns Molecular Sequence Data Proteomics/methods RNA, Messenger/metabolism Sequence Analysis, Protein Sequence Homology, Nucleic Acid Software User-Computer Interface Collapse Grants Collapse Affiliation(s) Collapse
50	Structural and functional properties of genes involved in human cancer. BMC Genomics 2006;7:3. [PMID: 16405732 PMCID: PMC1373651 DOI: 10.1186/1471-2164-7-3] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2005] [Accepted: 01/11/2006] [Indexed: 11/10/2022] Open Abstract BACKGROUND One of the main goals of cancer genetics is to identify the causative elements at the molecular level leading to cancer. RESULTS We have conducted an analysis of a set of genes known to be involved in cancer in order to unveil their unique features that can assist towards the identification of new candidate cancer genes. CONCLUSION We have detected key patterns in this group of genes in terms of the molecular function or the biological process in which they are involved as well as sequence properties. Based on these features we have developed an accurate Bayesian classification model with which human genes have been scored for their likelihood of involvement in cancer. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Bayes Theorem Conserved Sequence Gene Expression Regulation, Neoplastic Humans Neoplasm Proteins/chemistry Neoplasm Proteins/genetics Neoplasms/genetics Oligonucleotide Array Sequence Analysis Proteins/chemistry Proteins/genetics Collapse Grants Collapse Affiliation(s) Collapse