1
|
Chen Y, Wang J, Yang S, Utturkar S, Crodian J, Cummings S, Thimmapuram J, San Miguel P, Kuang S, Gribskov M, Plaut K, Casey T. Effect of high-fat diet on secreted milk transcriptome in midlactation mice. Physiol Genomics 2017; 49:747-762. [PMID: 29093195 DOI: 10.1152/physiolgenomics.00080.2017] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
High-fat diet (HFD) during lactation alters milk composition and is associated with development of metabolic diseases in the offspring. We hypothesized that HFD affects milk microRNA (miRNA) and mRNA content, which potentially impact offspring development. Our objective was to determine the effect of maternal HFD on secreted milk transcriptome. To meet this objective, 4 wk old female ICR mice were divided into two treatments: control diet containing 10% kcal fat and HFD containing 60% kcal fat. After 4 wk on CD or HFD, mice were bred while continuously fed the same diets. On postnatal day 2 (P2), litters were normalized to 10 pups, and half the pups in each litter were cross-fostered between treatments. Milk was collected from dams on P10 and P12. Total RNA was isolated from milk fat fraction of P10 samples and used for mRNA-Seq and small RNA-Seq. P12 milk was used to determine macronutrient composition. After 4 wk of prepregnancy feeding HFD mice weighed significantly more than did the control mice. Lactose and fat concentration were significantly ( P < 0.05) higher in milk of HFD dams. Pup weight was significantly greater ( P < 0.05) in groups suckled by HFD vs. control dams. There were 25 miRNA and over 1,500 mRNA differentially expressed (DE) in milk of HFD vs. control dams. DE mRNA and target genes of DE miRNA enriched categories that were primarily related to multicellular organismal development. Maternal HFD impacts mRNA and miRNA content of milk, if bioactive nucleic acids are absorbed by neonate differences may affect development.
Collapse
Affiliation(s)
- Y. Chen
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - J. Wang
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - S. Yang
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - S. Utturkar
- Bioinformatics Core, Purdue University, West Lafayette, Indiana
| | - J. Crodian
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - S. Cummings
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - J. Thimmapuram
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - P. San Miguel
- Genomics Core at Purdue University, West Lafayette, Indiana
| | - S. Kuang
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - M. Gribskov
- Bioinformatics Core, Purdue University, West Lafayette, Indiana
| | - K. Plaut
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| | - T. Casey
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana
| |
Collapse
|
2
|
Hengenius JB, Gribskov M, Rundell AE, Umulis DM. Making models match measurements: model optimization for morphogen patterning networks. Semin Cell Dev Biol 2014; 35:109-23. [PMID: 25016297 DOI: 10.1016/j.semcdb.2014.06.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Revised: 06/17/2014] [Accepted: 06/24/2014] [Indexed: 01/13/2023]
Abstract
Mathematical modeling of developmental signaling networks has played an increasingly important role in the identification of regulatory mechanisms by providing a sandbox for hypothesis testing and experiment design. Whether these models consist of an equation with a few parameters or dozens of equations with hundreds of parameters, a prerequisite to model-based discovery is to bring simulated behavior into agreement with observed data via parameter estimation. These parameters provide insight into the system (e.g., enzymatic rate constants describe enzyme properties). Depending on the nature of the model fit desired - from qualitative (relative spatial positions of phosphorylation) to quantitative (exact agreement of spatial position and concentration of gene products) - different measures of data-model mismatch are used to estimate different parameter values, which contain different levels of usable information and/or uncertainty. To facilitate the adoption of modeling as a tool for discovery alongside other tools such as genetics, immunostaining, and biochemistry, careful consideration needs to be given to how well a model fits the available data, what the optimized parameter values mean in a biological context, and how the uncertainty in model parameters and predictions plays into experiment design. The core discussion herein pertains to the quantification of model-to-data agreement, which constitutes the first measure of a model's performance and future utility to the problem at hand. Integration of this experimental data and the appropriate choice of objective measures of data-model agreement will continue to drive modeling forward as a tool that contributes to experimental discovery. The Drosophila melanogaster gap gene system, in which model parameters are optimized against in situ immunofluorescence intensities, demonstrates the importance of error quantification, which is applicable to a wide array of developmental modeling studies.
Collapse
Affiliation(s)
- J B Hengenius
- Department of Biological Sciences, Purdue University, 247 S. Martin Jischke Drive, West Lafayette, IN 47907, United States
| | - M Gribskov
- Department of Biological Sciences, Purdue University, 247 S. Martin Jischke Drive, West Lafayette, IN 47907, United States
| | - A E Rundell
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, IN 47907, United States
| | - D M Umulis
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, IN 47907, United States; Department of Agricultural and Biological Engineering, Purdue University, 225 S. University Street, West Lafayette, IN 47907, United States.
| |
Collapse
|
3
|
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, Van de Peer Y, Rokhsar D. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006; 313:1596-604. [PMID: 16973872 DOI: 10.1126/science.1128691] [Citation(s) in RCA: 2575] [Impact Index Per Article: 143.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
Collapse
Affiliation(s)
- G A Tuskan
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Abstract
In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase sigma-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initiation penalty) is quite important, but shows a broad peak at values of about 20-24. The length dependent gap penalty (gap extension penalty) is almost irrelevant suggesting that successful database searches rely only to a limited degree on gapped alignments. Taken together, these observations lead to the conclusion that the optimal conditions for alignments and database searches are not, and should not be expected to be, the same.
Collapse
Affiliation(s)
- M Gribskov
- San Diego Supercomputer Center, P.O. Box 85608, San Diego, CA 92186-9784, USA
| | | |
Collapse
|
5
|
Zheng CL, Nair TM, Gribskov M, Kwon YS, Li HR, Fu XD. A database designed to computationally aid an experimental approach to alternative splicing. Pac Symp Biocomput 2004:78-88. [PMID: 14992494 DOI: 10.1142/9789812704856_0008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A unique microarray approach has been developed to profile alternative splicing in the cell. To support the development of this approach, we have developed the Manually Annotated Alternatively Spliced Events (MAASE) database system, which is a unique alternative splicing information resource designed specifically with experimentalists in mind. MAASE is an online resource for the convenient access, identification, and annotation of alternative splicing events (ASEs). MAASE consists of two components: an annotation system and a curated database. The annotation system is a web-based workspace that combines manual and computational approaches to identifying and annotating ASEs, a combination that is vital if a comprehensive collection is to be obtained. The annotation system is publicly available and provides a scalable solution to acquiring as well as contributing to annotated ASEs. MAASE annotated ASEs are deposited into the database component, which can either be queried one entry at a time or multiple entries at a time with convenient access to alternatively spliced junctional and surrounding sequences to facilitate the design of microarray experiments.
Collapse
Affiliation(s)
- C L Zheng
- University of California, San Diego, San Diego Supercomputer Center, 9500 Gilman Dr., La Jolla, CA 92093, USA.
| | | | | | | | | | | |
Collapse
|
6
|
Ghassemian M, Waner D, Tchieu J, Gribskov M, Schroeder JI. An integrated Arabidopsis annotation database for Affymetrix Genechip data analysis, and tools for regulatory motif searches. Trends Plant Sci 2001; 6:448-449. [PMID: 11590042 DOI: 10.1016/s1360-1385(01)02092-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Genome-scale sequencing projects have provided the essential information required for the construction of entire genome chips or microarrays for RNA expression studies. The Arabidopsis and rice genomes have been sequenced and whole-genome oligonucleotide arrays are being manufactured. These should soon become available to researchers. Expression studies using genomic-scale expression arrays are providing us with a vast quantity of information at a rapid pace. The rate-limiting step in this type of experiments is not the data generation step but rather the data analysis component of experiments. We report improvements that should facilitate the analysis of Affymetrix Genechip expression data.
Collapse
Affiliation(s)
- M Ghassemian
- Division of Biology, Cell and Developmental Biology Section and Center for Molecular Genetics, University of California, San Diego, 92093-0116, La Jolla, CA, USA
| | | | | | | | | |
Collapse
|
7
|
Mäser P, Thomine S, Schroeder JI, Ward JM, Hirschi K, Sze H, Talke IN, Amtmann A, Maathuis FJ, Sanders D, Harper JF, Tchieu J, Gribskov M, Persans MW, Salt DE, Kim SA, Guerinot ML. Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol 2001; 126:1646-1667. [PMID: 11500563 DOI: 10.2307/4280038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Uptake and translocation of cationic nutrients play essential roles in physiological processes including plant growth, nutrition, signal transduction, and development. Approximately 5% of the Arabidopsis genome appears to encode membrane transport proteins. These proteins are classified in 46 unique families containing approximately 880 members. In addition, several hundred putative transporters have not yet been assigned to families. In this paper, we have analyzed the phylogenetic relationships of over 150 cation transport proteins. This analysis has focused on cation transporter gene families for which initial characterizations have been achieved for individual members, including potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels, cation diffusion facilitator proteins, natural resistance-associated macrophage proteins (NRAMP), and Zn-regulated transporter Fe-regulated transporter-like proteins. Phylogenetic trees of each family define the evolutionary relationships of the members to each other. These families contain numerous members, indicating diverse functions in vivo. Closely related isoforms and separate subfamilies exist within many of these gene families, indicating possible redundancies and specialized functions. To facilitate their further study, the PlantsT database (http://plantst.sdsc.edu) has been created that includes alignments of the analyzed cation transporters and their chromosomal locations.
Collapse
Affiliation(s)
- P Mäser
- Division of Biology, Cell and Developmental Biology Section and Center for Molecular Genetics, University of California, San Diego, La Jolla, California 92093-0116, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Mäser P, Thomine S, Schroeder JI, Ward JM, Hirschi K, Sze H, Talke IN, Amtmann A, Maathuis FJ, Sanders D, Harper JF, Tchieu J, Gribskov M, Persans MW, Salt DE, Kim SA, Guerinot ML. Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol 2001; 126:1646-67. [PMID: 11500563 PMCID: PMC117164 DOI: 10.1104/pp.126.4.1646] [Citation(s) in RCA: 719] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2001] [Revised: 04/12/2001] [Accepted: 05/01/2001] [Indexed: 05/17/2023]
Abstract
Uptake and translocation of cationic nutrients play essential roles in physiological processes including plant growth, nutrition, signal transduction, and development. Approximately 5% of the Arabidopsis genome appears to encode membrane transport proteins. These proteins are classified in 46 unique families containing approximately 880 members. In addition, several hundred putative transporters have not yet been assigned to families. In this paper, we have analyzed the phylogenetic relationships of over 150 cation transport proteins. This analysis has focused on cation transporter gene families for which initial characterizations have been achieved for individual members, including potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels, cation diffusion facilitator proteins, natural resistance-associated macrophage proteins (NRAMP), and Zn-regulated transporter Fe-regulated transporter-like proteins. Phylogenetic trees of each family define the evolutionary relationships of the members to each other. These families contain numerous members, indicating diverse functions in vivo. Closely related isoforms and separate subfamilies exist within many of these gene families, indicating possible redundancies and specialized functions. To facilitate their further study, the PlantsT database (http://plantst.sdsc.edu) has been created that includes alignments of the analyzed cation transporters and their chromosomal locations.
Collapse
Affiliation(s)
- P Mäser
- Division of Biology, Cell and Developmental Biology Section and Center for Molecular Genetics, University of California, San Diego, La Jolla, California 92093-0116, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Reiter LT, Potocki L, Chien S, Gribskov M, Bier E. A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome Res 2001; 11:1114-25. [PMID: 11381037 PMCID: PMC311089 DOI: 10.1101/gr.169101] [Citation(s) in RCA: 581] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2000] [Accepted: 04/11/2001] [Indexed: 11/24/2022]
Abstract
We performed a systematic analysis of 929 human disease gene entries associated with at least one mutant allele in the Online Mendelian Inheritance in Man (OMIM) database against the recently completed genome sequence of Drosophila melanogaster. The results of this search have been formatted as an updateable and searchable on-line database called Homophila. Our analysis identified 714 distinct human disease genes (77% of disease genes searched) matching 548 unique Drosophila sequences, which we have summarized by disease category. This breakdown into disease classes creates a picture of disease genes that are amenable to study using Drosophila as the model organism. Of the 548 Drosophila genes related to human disease genes, 153 are associated with known mutant alleles and 56 more are tagged by P-element insertions in or near the gene. Examples of how to use the database to identify Drosophila genes related to human disease genes are presented. We anticipate that cross-genomic analysis of human disease genes using the power of Drosophila second-site modifier screens will promote interaction between human and Drosophila research groups, accelerating the understanding of the pathogenesis of human genetic disease. The Homophila database is available at http://homophila.sdsc.edu.
Collapse
Affiliation(s)
- L T Reiter
- Section of Cell and Developmental Biology, University of California San Diego, La Jolla, California 92093-0349, USA
| | | | | | | | | |
Collapse
|
10
|
Abstract
MOTIVATION High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. RESULTS We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.
Collapse
Affiliation(s)
- D R Masys
- Department of Medicine, UCSD Cancer Center, University of California, San Diego, San Diego, CA 92093, USA
| | | | | | | | | | | |
Collapse
|
11
|
Gribskov M, Fana F, Harper J, Hope DA, Harmon AC, Smith DW, Tax FE, Zhang G. PlantsP: a functional genomics database for plant phosphorylation. Nucleic Acids Res 2001; 29:111-3. [PMID: 11125063 PMCID: PMC29854 DOI: 10.1093/nar/29.1.111] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The PlantsP database is a curated database that combines information derived from sequences with experimental functional genomics information. PlantsP focuses on plant protein kinases and protein phosphatases. The database will specifically provide a resource for information on a collection of T-DNA insertion mutants (knockouts) in each protein kinase and phosphatase in Arabidopsis thaliana. PlantsP also provides a curated view of each protein that includes a comprehensive annotation of functionally related sequence motifs, sequence family definitions, alignments and phylogenetic trees, and descriptive information drawn directly from the literature. PlantsP is available at http://PlantsP.sdsc.edu.
Collapse
Affiliation(s)
- M Gribskov
- San Diego Supercomputer Center and Department of Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Bourne PE, Gribskov M. ISMB-2000: bioinformatics enters a new millennium. Bioinformatics 2000; 16:749. [PMID: 11108696 DOI: 10.1093/bioinformatics/16.9.749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
13
|
|
14
|
Abstract
Numerous stimuli can alter the Ca2+concentration in the cytoplasm, a factor common to many physiological responses in plant and animal cells. Calcium-binding proteins decode information contained in the temporal and spatial patterns of these Ca2+ signals and bring about changes in metabolism and gene expression. In addition to calmodulin, a calcium-binding protein found in all eukaryotes, plants contain a large family of calcium-binding regulatory protein kinases. Evidence is accumulating that these protein kinases participate in numerous aspects of plant growth and development.
Collapse
Affiliation(s)
- A C Harmon
- Dept of Botany, University of Florida, Gainesville 32611-8526, USA.
| | | | | |
Collapse
|
15
|
|
16
|
Abstract
The delta antigen of hepatitis delta virus exhibits sequence specific binding to its own RNA and is essential for viral replication. Using statistical methods we have detected significant similarity between the RNA-binding domain of the hepatitis delta antigen and the HMG box of SRY. Our analysis suggests that the RNA-binding domain of HDV antigen evolved from the DNA-binding domain of the HMG box. SRY, or a related protein, is a probable cellular cognate of HDV.
Collapse
Affiliation(s)
- S Veretnik
- San Diego Supercomputer Center, California, USA
| | | |
Collapse
|
17
|
Bourne P, Gribskov M, Johnson G, Moreland J, Wavra S, Weissig H. A prototype molecular interactive collaborative environment (MICE). Pac Symp Biocomput 1998:118-29. [PMID: 9697176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Illustrations of macromolecular structure in the scientific literature contain a high level of semantic content through which the authors convey, among other features, the biological function of that macromolecule. We refer to these illustrations as molecular scenes. Such scenes, if available electronically, are not readily accessible for further interactive interrogation. The basic PDB format does not retain features of the scene; formats like PostScript retain the scene but are not interactive; and the many formats used by individual graphics programs, while capable of reproducing the scene, are neither interchangeable nor can they be stored in a database and queried for features of the scene. MICE defines a Molecular Scene Description Language (MSDL) which allows scenes to be stored in a relational database (a molecular scene gallery) and queried. Scenes retrieved from the gallery are rendered in Virtual Reality Modeling Language (VRML) and currently displayed in WebView, a VRML browser modified to support the Virtual Reality Behavior System (VRBS) protocol. VRBS provides communication between multiple client browsers, each capable of manipulating the scene. This level of collaboration works well over standard Internet connections and holds promise for collaborative research at a distance and distance learning. Further, via VRBS, the VRML world can be used as a visual cue to trigger an application such as a remote MEME search. MICE is very much work in progress. Current work seeks to replace WebView with Netscape, Cosmoplayer, a standard VRML plug-in, and a Java-based console. The console consists of a generic kernel suitable for multiple collaborative applications and additional application-specific controls. Further details of the MICE project are available at http:/(/)mice.sdsc.edu.
Collapse
Affiliation(s)
- P Bourne
- San Diego Supercomputer Center, CA 92186, USA
| | | | | | | | | | | |
Collapse
|
18
|
Abstract
Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Collapse
Affiliation(s)
- T L Bailey
- San Diego Supercomputer Center, California 92186-9784, USA.
| | | |
Collapse
|
19
|
Abstract
MOTIVATION To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. RESULTS In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
Collapse
Affiliation(s)
- T L Bailey
- San Diego Supercomputer Center, CA 92186-9784, USA
| | | |
Collapse
|
20
|
Affiliation(s)
- C M Smith
- San Diego Supercomputer Center (SDSC), CA 92186, USA
| | | | | | | | | | | | | |
Collapse
|
21
|
Abstract
Several computer algorithms now exist for discovering multiple motifs (expressed as weight matrices) that characterize a family of protein sequences known to be homologous. This paper describes a method for performing similarity searches of protein sequence databases using such a group of motifs. By simultaneously using all the motifs that characterize a protein family, the sensitivity and specificity of the database search are increased. We define the p-value for a target sequence to be the probability of a random sequence of the same length scoring as well or better in comparison to all the motifs that characterize the family. (The p-value of a database search can be determined from this value and the size of the database.) We show that estimating the distribution of single motif scores by a Gaussian extreme value distribution is insufficiently accurate to provide a useful estimate of the p-value, but that this deficiency can be corrected by reestimating the parameters of the underlying Gaussian distribution from observed scores for comparison of a given motif and sequence database. These parameters are used to calculate a "reduced variate" which has a Gumbel limiting distribution. Multiple motif scores are combined to give a single p-value by using the sum of the reduced variates for the motif scores as the test statistic. We give a computationally efficient approximation to the distribution of the sum of independent Gumbel random variables and verify experimentally that it closely approximates the distribution of the test statistic. Experiments on pseudorandom sequences show that the approximated p-values are conservative, so the significance of high scores in database searches will not be overstated. Experiments with real protein sequences and motifs identified by the MEME algorithm show that determining an overall p-value based on the combination of multiple motifs gives significantly better database search results than using p-values of single motifs.
Collapse
Affiliation(s)
- T L Bailey
- San Diego Supercomputer Center, California 92186-9784, USA.
| | | |
Collapse
|
22
|
Miller M, Geller M, Gribskov M, Kent SB. Analysis of the structure of chemically synthesized HIV-1 protease complexed with a hexapeptide inhibitor. Part I: Crystallographic refinement of 2 A data. Proteins 1997; 27:184-94. [PMID: 9061782 DOI: 10.1002/(sici)1097-0134(199702)27:2<184::aid-prot4>3.0.co;2-g] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The structure of a complex between a hexapeptide-based inhibitor, MVT-101, and the chemically synthesized (Aba 67,95,167,195; Aba: L-alpha-amino-n-butyric acid) protease from the human immunodeficiency virus (HIV-1), reported previously at 2.3 A has now been refined to a crystallographic R factor of 15.4% at 2.0 A resolution. Root mean square deviations from ideality are 0.18 A for bond lengths and 2.4 degrees for the angles. The inhibitor can be fitted to the difference electron density map in two alternative orientations. Drastic differences are observed for positions and interactions at P3/S3 and P3'/S3' subsites of the two orientations due to different crystallographic environments.
Collapse
Affiliation(s)
- M Miller
- Macromolecular Structure Laboratory, NCI-Frederick Cancer Research Facility and Development Center, MD 21702, USA
| | | | | | | |
Collapse
|
23
|
Bailey TL, Gribskov M. The megaprior heuristic for discovering protein sequence patterns. Proc Int Conf Intell Syst Mol Biol 1996; 4:15-24. [PMID: 8877500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Several computer algorithms for discovering patterns in groups of protein sequences are in use that are based on fitting the parameters of a statistical model to a group of related sequences. These include hidden Markov model (HMM) algorithms for multiple sequence alignment, and the MEME and Gibbs sampler algorithms for discovering motifs. These algorithms are sometimes prone to producing models that are incorrect because two or more patients have been combined. The statistical model produced in this situation is a convex combination (weighted average) of two or more different models. This paper presents a solution to the problem of convex combinations in the form of a heuristic based on using extremely low variance Dirichlet mixture priors as part of the statistical model. This heuristic, which we call the megaprior heuristic, increase the strength (i.e., decreases the variance) of the prior in proportion to the size of the sequence dataset. This causes each column in the final model to strongly resemble the mean of a single component of the prior, regardless of the size of the dataset. We describe the cause of the convex combination problem, analyze it mathematically, motivate and describe the implementation of the megaprior heuristic, and show how it can effectively eliminate the problem of convex combinations in protein sequence pattern discovery.
Collapse
Affiliation(s)
- T L Bailey
- San Diego Supercomputer Center, San Diego, California 92186-9784, USA.
| | | |
Collapse
|
24
|
Affiliation(s)
- M Gribskov
- San Diego Supercomputer Center, La Jolla, California 92093, USA
| | | |
Collapse
|
25
|
|
26
|
Gribskov M. Translational initiation factors IF-1 and eIF-2 alpha share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylase. Gene 1992; 119:107-11. [PMID: 1383091 DOI: 10.1016/0378-1119(92)90073-x] [Citation(s) in RCA: 59] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Initiation of translation is a complicated process involving numerous accessory factors whose functions remain incompletely understood. Bacterial ribosomal protein S1 is known to contain a repeated sequence motif (S1-RM), also found in polynucleotide phosphorylase, that is thought to be involved in binding to RNA. Using the technique of profile analysis, the S1-RM can also be found in bacterial and chloroplast translation initiation factor IF-1 sequences, and in the sequences of eukaryotic translation initiation factor eIF-2 alpha chains. The significance of the similarity of the sequences is very high suggesting that the occurrence of the S1-RM in these diverse proteins represents homology. The similarity of S1 to IF-1 further suggests that S1 has evolved from an IF-1 like ancestor, and therefore that the two proteins have a similar or competitive function. The most obvious common function of the proteins containing the S1-RM seems to be RNA binding, suggesting that IF-1 and eIF-2 alpha may bind to RNA.
Collapse
Affiliation(s)
- M Gribskov
- National Cancer Institute, Frederick Cancer Research and Development Center, ABL-Basic Research Program, MD
| |
Collapse
|
27
|
Affiliation(s)
- M Lonetto
- Department of Bacteriology, University of Wisconsin, Madison 53706
| | | | | |
Collapse
|
28
|
|
29
|
Mullen JR, Kayne PS, Moerschell RP, Tsunasawa S, Gribskov M, Colavito-Shepanski M, Grunstein M, Sherman F, Sternglanz R. Identification and characterization of genes and mutants for an N-terminal acetyltransferase from yeast. EMBO J 1989; 8:2067-75. [PMID: 2551674 PMCID: PMC401092 DOI: 10.1002/j.1460-2075.1989.tb03615.x] [Citation(s) in RCA: 239] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
A gene from Saccharomyces cerevisiae has been mapped, cloned, sequenced and shown to encode a catalytic subunit of an N-terminal acetyltransferase. Regions of this gene, NAT1, and the chloramphenicol acetyltransferase genes of bacteria have limited but significant homology. A nat1 null mutant is viable but exhibits a variety of phenotypes, including reduced acetyltransferase activity, derepression of a silent mating type locus (HML) and failure to enter G0. All these phenotypes are identical to those of a previously characterized mutant, ard1. NAT1 and ARD1 are distinct genes that encode proteins with no obvious similarity. Concomitant overexpression of both NAT1 and ARD1 in yeast causes a 20-fold increase in acetyltransferase activity in vitro, whereas overexpression of either NAT1 or ARD1 alone does not raise activity over basal levels. A functional iso-1-cytochrome c protein, which is N-terminally acetylated in a NAT1 strain, is not acetylated in an isogenic nat1 mutant. At least 20 other yeast proteins, including histone H2B, are not N-terminally acetylated in either nat1 or ard1 mutants. These results suggest that NAT1 and ARD1 proteins function together to catalyze the N-terminal acetylation of a subset of yeast proteins.
Collapse
Affiliation(s)
- J R Mullen
- Department of Biochemistry, State University of New York, Stony Brook 11794
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Profile analysis measures the similarity between a target sequence and a group of aligned sequences (the probe). The probe sequences are used to produce a position-specific scoring table (the profile) that can be aligned with any sequence (the target) using standard dynamic programming methods. We are developing a library of profiles, each describing a different structural motif. This allows any target sequence to be rapidly scanned for the presence of structural motifs. Levels of significance for the comparison of target sequences with the profile are determined in advance, permitting an objective decision to be made as to whether a protein is likely to possess a structural motif.
Collapse
Affiliation(s)
- M Gribskov
- Molecular Biology Institute, University of California, Los Angeles 90024-1570
| | | | | | | |
Collapse
|
31
|
Abstract
Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by structural or sequence similarity. The similarity of any other sequence (target) to the group of aligned sequences (probe) can be tested by comparing the target to the profile using dynamic programming algorithms. The profile method differs in two major respects from methods of sequence comparison in common use: (i) Any number of known sequences can be used to construct the profile, allowing more information to be used in the testing of the target than is possible with pairwise alignment methods. (ii) The profile includes the penalties for insertion or deletion at each position, which allow one to include the probe secondary structure in the testing scheme. Tests with globin and immunoglobulin sequences show that profile analysis can distinguish all members of these families from all other sequences in a database containing 3800 protein sequences.
Collapse
|
32
|
Abstract
We show, using dot matrix comparisons and statistical analysis of sequence alignments, that seven sequenced sigma factors, E. coli sigma-70 and sigma-32, B. subtilis sigma-43 and sigma-29, phage SP01 gene products 28 and 34, and phage T4 gene product 55, comprise a homologous family of proteins. Sigma-70, sigma-32, and sigma-43 each have two copies of a sequence similar to the helix-turn-helix DNA binding motif seen in CRP, and lambda repressor and cro proteins. B. subtilis sigma-29, SP01 gp28, and SP01 gp34 have at least one copy similar to this sequence. We propose that a second sequence, conserved in all seven proteins is the core RNA polymerase binding site. A third region, present only in sigma-70 and sigma-43, may also be involved in interaction with core. Available mutational evidence supports our model for sigma factor structure.
Collapse
|
33
|
Gribskov M, Burgess RR, Devereux J. PEPPLOT, a protein secondary structure analysis program for the UWGCG sequence analysis software package. Nucleic Acids Res 1986; 14:327-34. [PMID: 3753771 PMCID: PMC339416 DOI: 10.1093/nar/14.1.327] [Citation(s) in RCA: 59] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
We describe a program for the analysis of protein secondary structure that operates with the Sequence Analysis Software Package of the University of Wisconsin Genetics Computer Group (UWGCG). The program produces both graphic and printed output. Structure prediction using the Chou and Fasman and Robson et al methods, and hydropathy analysis by the method of Kyte and Doolittle are included along with a simplified method of hydrophobic moment analysis. The power of the program is the coordinated presentation of many different kinds of structural information on the same plot.
Collapse
|
34
|
Gribskov M, Devereux J, Burgess RR. The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res 1984; 12:539-49. [PMID: 6694906 PMCID: PMC321069 DOI: 10.1093/nar/12.1part2.539] [Citation(s) in RCA: 308] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The codon preference plot is useful for locating genes in sequenced DNA, predicting the relative level of their expression and for detecting DNA sequencing errors resulting in the insertion or deletion of bases within a coding sequence. The three possible reading frames are displayed in parallel along with the open reading frames and plots of the location of rare codons in each reading frame.
Collapse
|
35
|
Abstract
We have constructed a plasmid that overexpresses 100-fold the sigma subunit of Escherichia coli RNA polymerase. The plasmid was constructed by placing the pLoL promoter-operator of bacteriophage lambda upstream from rpoD, the gene encoding the sigma subunit. A simple procedure for purification of the overexpressed protein has been developed based on guanidine hydrochloride denaturation/renaturation, DEAE cellulose chromatography, and Sephacryl S-200 chromatography. The purified product has been characterized and found to be indistinguishable from normally expressed sigma protein purified by previous protocols as judged by enzymatic activity, heat inactivation, and partial proteolysis.
Collapse
|