1
|
Chrisman BS, Paskov KM, He C, Jung JY, Stockham N, Washington PY, Wall DP. A Method for Localizing Non-Reference Sequences to the Human Genome. Pac Symp Biocomput 2022; 27:313-324. [PMID: 34890159 PMCID: PMC8730539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
As the last decade of human genomics research begins to bear the fruit of advancements in precision medicine, it is important to ensure that genomics' improvements in human health are distributed globally and equitably. An important step to ensuring health equity is to improve the human reference genome to capture global diversity by including a wide variety of alternative haplotypes, sequences that are not currently captured on the reference genome.We present a method that localizes 100 basepair (bp) long sequences extracted from short-read sequencing that can ultimately be used to identify what regions of the human genome non-reference sequences belong to.We extract reads that don't align to the reference genome, and compute the population's distribution of 100-mers found within the unmapped reads. We use genetic data from families to identify shared genetic material between siblings and match the distribution of unmapped k-mers to these inheritance patterns to determine the the most likely genomic region of a k-mer. We perform this localization with two highly interpretable methods of artificial intelligence: a computationally tractable Hidden Markov Model coupled to a Maximum Likelihood Estimator. Using a set of alternative haplotypes with known locations on the genome, we show that our algorithm is able to localize 96% of k-mers with over 90% accuracy and less than 1Mb median resolution. As the collection of sequenced human genomes grows larger and more diverse, we hope that this method can be used to improve the human reference genome, a critical step in addressing precision medicine's diversity crisis.
Collapse
Affiliation(s)
| | - Kelley M Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Chloe He
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, CA 94305, USA
| | | | - Dennis Paul Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
2
|
Stockham NT, Paskov KM, Tabatabaei K, Sutaria S, Liu B, Kent J, Wall DP. An Informatics Analysis to Identify Sex Disparities and Healthcare Needs for Autism across the United States. AMIA Annu Symp Proc 2022; 2022:456-465. [PMID: 35854759 PMCID: PMC9285147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Autism is among the most common neurodevelopmental conditions. Timely diagnosis and access to therapeutic resources are essential for positive prognoses, yet long queues and unevenly dispersed resources leave many untreated. Without granular estimates of autism prevalence by geographic area, it is difficult to identify unmet needs and mechanisms to address them. Mining a dataset of 53M children using meaningful geographic regions, we computed autism prevalence across the country. We then performed comparative analysis against 50,000 resources to identify the type and extent of gaps in access to autism services. We find a steady increase in autism diagnoses from K-5, supporting delayed diagnosis of autism, and consistent under-diagnosis of females. We find a significant inverse relationship between prevalence and availability of resources (p < 0.001). While more work is needed to characterize additional trends including racial and ethnicity-based disparities, the identification of resource gaps can direct and prioritize new innovations.
Collapse
Affiliation(s)
| | - Kelley M Paskov
- Stanford University, Stanford, California
- These authors contributed equally
| | - Kevin Tabatabaei
- Stanford University, Stanford, California
- McMaster University, Hamilton, Canada
| | | | | | - Jack Kent
- Stanford University, Stanford, California
| | | |
Collapse
|
3
|
Chrisman BS, Paskov KM, Stockham N, Jung JY, Varma M, Washington PY, Tataru C, Iwai S, DeSantis TZ, David M, Wall DP. Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers. BMC Bioinformatics 2021; 22:509. [PMID: 34666677 PMCID: PMC8527694 DOI: 10.1186/s12859-021-04427-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 10/06/2021] [Indexed: 12/31/2022] Open
Abstract
Background Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. Results On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. Conclusions SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers.
Collapse
Affiliation(s)
- Brianna S Chrisman
- Department of Bioengineering, Stanford University, Serra Mall, Stanford, USA.
| | - Kelley M Paskov
- Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Serra Mall, Stanford, USA
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Serra Mall, Stanford, USA
| | - Peter Y Washington
- Department of Bioengineering, Stanford University, Serra Mall, Stanford, USA
| | - Christine Tataru
- Department of Computer Science, Oregon State University, SW Campus Way, Corvallis, USA
| | - Shoko Iwai
- Second Genome Inc, Allerton Ave, Brisbane, USA
| | | | - Maude David
- Department of Microbiology, Oregon State University, SW Campus Way, Corvallis, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA.,Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Road, Stanford, USA
| |
Collapse
|
4
|
Varma M, Paskov KM, Chrisman BS, Sun MW, Jung JY, Stockham NT, Washington PY, Wall DP. A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism. BioData Min 2021; 14:28. [PMID: 33941233 PMCID: PMC8091705 DOI: 10.1186/s13040-021-00262-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 04/20/2021] [Indexed: 12/05/2022] Open
Abstract
Background Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. Results We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. Conclusion Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00262-x.
Collapse
Affiliation(s)
- Maya Varma
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kelley M Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | | | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.,Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Nate T Stockham
- Department of Neuroscience, Stanford University, Stanford, CA, USA
| | | | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. .,Department of Pediatrics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
5
|
Sun MW, Moretti S, Paskov KM, Stockham NT, Varma M, Chrisman BS, Washington PY, Jung JY, Wall DP. Game theoretic centrality: a novel approach to prioritize disease candidate genes by combining biological networks with the Shapley value. BMC Bioinformatics 2020; 21:356. [PMID: 32787845 PMCID: PMC7430867 DOI: 10.1186/s12859-020-03693-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 07/21/2020] [Indexed: 11/13/2022] Open
Abstract
Background Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance–the individual gene’s synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene’s contribution to the overall connectivity of its corresponding node in a biological network. Results We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1–have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex. Conclusions Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.
Collapse
Affiliation(s)
- Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, USA.,Department of Pediatrics, Stanford University, Stanford, USA
| | - Stefano Moretti
- LAMSADE, CNRS, Université Paris-Dauphine, Université PSL, Paris, France
| | - Kelley M Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Nate T Stockham
- Department of Neuroscience, Stanford University, Stanford, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, USA
| | | | | | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, USA.,Department of Pediatrics, Stanford University, Stanford, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, USA. .,Department of Pediatrics, Stanford University, Stanford, USA. .,Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, United States.
| |
Collapse
|
6
|
Sun MW, Gupta* A, Varma M, Paskov KM, Jung JY, Stockham NT, Wall DP. Coalitional Game Theory Facilitates Identification of Non-Coding Variants Associated With Autism. Biomed Inform Insights 2019; 11:1178222619832859. [PMID: 30886520 PMCID: PMC6410388 DOI: 10.1177/1178222619832859] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 12/17/2018] [Indexed: 12/18/2022]
Abstract
Studies on autism spectrum disorder (ASD) have amassed substantial evidence for the role of genetics in the disease's phenotypic manifestation. A large number of coding and non-coding variants with low penetrance likely act in a combinatorial manner to explain the variable forms of ASD. However, many of these combined interactions, both additive and epistatic, remain undefined. Coalitional game theory (CGT) is an approach that seeks to identify players (individual genetic variants or genes) who tend to improve the performance-association to a disease phenotype of interest-of any coalition (subset of co-occurring genetic variants) they join. This method has been previously applied to boost biologically informative signal from gene expression data and exome sequencing data but remains to be explored in the context of cooperativity among non-coding genomic regions. We describe our extension of previous work, highlighting non-coding chromosomal regions relevant to ASD using CGT on alteration data of 4595 fully sequenced genomes from 756 multiplex families. Genomes were encoded into binary matrices for three types of non-coding regions previously implicated in ASD and separated into ASD (case) and unaffected (control) samples. A player metric, the Shapley value, enabled determination of individual variant contributions in both sets of cohorts. A total of 30 non-coding positions were found to have significantly elevated player scores and likely represent significant contributors to the genetic coordination underlying ASD. Cross-study analyses revealed that a subset of mutated non-coding regions (all of which are in human accelerated regions (HARs)) and related genes are involved in biological pathways or behavioral outcomes known to be affected in autism, suggesting the importance of single nucleotide polymorphisms (SNPs) within HARs in ASD. These findings support the use of CGT in identifying hidden yet influential non-coding players from large-scale genomic data, to better understand the precise underpinnings of complex neurodevelopmental disorders such as autism.
Collapse
Affiliation(s)
- Min Woo Sun
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Anika Gupta*
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Maya Varma
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Kelley M Paskov
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jae-Yoon Jung
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Nate T Stockham
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Dennis P Wall
- Departments of Pediatrics (Division of Systems Medicine), Psychiatry (by courtesy), and Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
7
|
Paskov KM, Wall DP. A Low Rank Model for Phenotype Imputation in Autism Spectrum Disorder. AMIA Jt Summits Transl Sci Proc 2018; 2017:178-187. [PMID: 29888068 PMCID: PMC5961817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Autism Spectrum Disorder is a highly heterogeneous condition currently diagnosed using behavioral symptoms. A better understanding of the phenotypic subtypes of autism is a necessary component of the larger goal of mapping autism genotype to phenotype. However, as with most clinical records describing human disease, the phenotypic data available for autism contains varying levels of noise and incompleteness that complicate analysis. Here we analyze behavioral data from 16,291 subjects using 250 items from three gold standard diagnostic instruments. We apply a low-rank model to impute missing entries and entire missing instruments with high fidelity, showing that we can complete clinical records for all subjects. Finally, we analyze the low-rank representation of our subjects to identify plausible subtypes of autism, setting the stage for genome-to-phenome prediction experiments. These procedures can be adapted and used with other similarly structured clinical records to enable a more complete mapping between genome and phenome.
Collapse
|
8
|
Hellerstedt ST, Nash RS, Weng S, Paskov KM, Wong ED, Karra K, Engel SR, Cherry JM. Curated protein information in the Saccharomyces genome database. Database (Oxford) 2017; 2017:3066359. [PMID: 28365727 PMCID: PMC5467551 DOI: 10.1093/database/bax011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/27/2017] [Indexed: 12/21/2022]
Abstract
Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. Database URL: www.yeastgenome.org
Collapse
Affiliation(s)
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kelley M Paskov
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
9
|
Sheppard TK, Hitz BC, Engel SR, Song G, Balakrishnan R, Binkley G, Costanzo MC, Dalusag KS, Demeter J, Hellerstedt ST, Karra K, Nash RS, Paskov KM, Skrzypek MS, Weng S, Wong ED, Cherry JM. The Saccharomyces Genome Database Variant Viewer. Nucleic Acids Res 2015; 44:D698-702. [PMID: 26578556 PMCID: PMC4702884 DOI: 10.1093/nar/gkv1250] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 11/02/2015] [Indexed: 11/18/2022] Open
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer.
Collapse
Affiliation(s)
- Travis K Sheppard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Giltae Song
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rama Balakrishnan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gail Binkley
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Maria C Costanzo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kyla S Dalusag
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Janos Demeter
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sage T Hellerstedt
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Robert S Nash
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kelley M Paskov
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Edith D Wong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Costanzo MC, Engel SR, Wong ED, Lloyd P, Karra K, Chan ET, Weng S, Paskov KM, Roe GR, Binkley G, Hitz BC, Cherry JM. Saccharomyces genome database provides new regulation data. Nucleic Acids Res 2013; 42:D717-25. [PMID: 24265222 PMCID: PMC3965049 DOI: 10.1093/nar/gkt1158] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the community resource for genomic, gene and protein information about the budding yeast Saccharomyces cerevisiae, containing a variety of functional information about each yeast gene and gene product. We have recently added regulatory information to SGD and present it on a new tabbed section of the Locus Summary entitled 'Regulation'. We are compiling transcriptional regulator-target gene relationships, which are curated from the literature at SGD or imported, with permission, from the YEASTRACT database. For nearly every S. cerevisiae gene, the Regulation page displays a table of annotations showing the regulators of that gene, and a graphical visualization of its regulatory network. For genes whose products act as transcription factors, the Regulation page also shows a table of their target genes, accompanied by a Gene Ontology enrichment analysis of the biological processes in which those genes participate. We additionally synthesize information from the literature for each transcription factor in a free-text Regulation Summary, and provide other information relevant to its regulatory function, such as DNA binding site motifs and protein domains. All of the regulation data are available for querying, analysis and download via YeastMine, the InterMine-based data warehouse system in use at SGD.
Collapse
Affiliation(s)
- Maria C Costanzo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|