1
|
Liu Z, Zhao X. piRNAs as emerging biomarkers and physiological regulatory molecules in cardiovascular disease. Biochem Biophys Res Commun 2024; 711:149906. [PMID: 38640879 DOI: 10.1016/j.bbrc.2024.149906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/01/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
Cardiovascular diseases (CVD) represent one of the most considerable global health threats, owing to their high incidence and mortality rates. Despite the ongoing advancements in detection, prevention, treatment, and prognosis of CVD, which have resulted in a decline in both incidence and mortality rates, CVD remains a major public health concern. Therefore, novel diagnostic biomarkers and therapeutic interventions are imperative to minimise the risk of CVD. Non-coding RNAs (ncRNAs) have recently gained increasing attention, with PIWI-interacting RNAs (piRNAs) emerging as a class of small ncRNAs traditionally recognised for their role in silencing transposons within cells. Although the functional roles of PIWI proteins and piRNAs in human cells remain unclear, growing evidence suggests that these molecules are gradually becoming valuable biomarkers for the diagnosis and treatment of CVD. This review provides a comprehensive summary of the latest studies on piRNAs in CVD. This review discusses the roles of piRNAs in various cardiovascular subtypes, including myocardial hypertrophy, heart failure, myocardial infarction, and cardiac regeneration. The perceived insights may contribute novel perspectives for the diagnosis and treatment of CVD.
Collapse
Affiliation(s)
- Zhihua Liu
- School of Basic Medical Sciences, Center for Precision Medicine, Kunming YanAn Hospital & Kunming University of Science and Technology, Kunming, China; Department of Biostatistics and Computational Biology, Bayer HealthCare, Harvard University, Boston, MA, USA.
| | - Xi Zhao
- School of Basic Medical Sciences, Center for Precision Medicine, Kunming YanAn Hospital & Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
2
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
3
|
Nguyen QH, Le DH. Similarity Calculation, Enrichment Analysis, and Ontology Visualization of Biomedical Ontologies using UFO. Curr Protoc 2021; 1:e115. [PMID: 33900688 DOI: 10.1002/cpz1.115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The rapid growth of biomedical ontologies observed in recent years has been reported to be useful in various applications. In this article, we propose two main-function protocols-term-related and entity-related-with the three most common ontology analyses, including similarity calculation, enrichment analysis, and ontology visualization, which can be done by separate methods. Many previously developed tools implementing those methods run on different platforms and implement a limited number of the methods for similarity calculation and enrichment analysis tools for a specific type of biomedical ontology, although any type can be acceptable. Moreover, depending on each application, methods have distinct advantages; thus, the greater the number of methods a tool has, the better decisions that users make. The protocol here implements all the analyses above using an advanced popular tool called UFO. UFO is a Cytoscape app that unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for biomedical ontologies in OBO format, which can calculate the similarity between two sets of entities and weigh imported entity networks, as well as generate functional similarity networks. The complete protocol can be performed in 30 min and is designed for use by biologists with no prior bioinformatics training. © 2021 Wiley Periodicals LLC. Basic Protocol: Running UFO using a list of input Gene Ontology, Disease Ontology, or Human Phenotype Ontology data.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
| |
Collapse
|
4
|
Le DH. UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization. PLoS One 2020; 15:e0235670. [PMID: 32645039 PMCID: PMC7347127 DOI: 10.1371/journal.pone.0235670] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 06/22/2020] [Indexed: 02/06/2023] Open
Abstract
Background Biomedical ontologies have been growing quickly and proven to be useful in many biomedical applications. Important applications of those data include estimating the functional similarity between ontology terms and between annotated biomedical entities, analyzing enrichment for a set of biomedical entities. Many semantic similarity calculation and enrichment analysis methods have been proposed for such applications. Also, a number of tools implementing the methods have been developed on different platforms. However, these tools have implemented a small number of the semantic similarity calculation and enrichment analysis methods for a certain type of biomedical ontology. Note that the methods can be applied to all types of biomedical ontologies. More importantly, each method can be dominant in different applications; thus, users have more choice with more number of methods implemented in tools. Also, more functions would facilitate their task with ontology. Results In this study, we developed a Cytoscape app, named UFO, which unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for all types of biomedical ontologies in OBO format. Based on the similarity calculation, UFO can calculate the similarity between two sets of entities and weigh imported entity networks as well as generate functional similarity networks. Besides, it can perform enrichment analysis of a set of entities by different methods. Moreover, UFO can visualize structural relationships between ontology terms, annotating relationships between entities and terms, and functional similarity between entities. Finally, we demonstrated the ability of UFO through some case studies on finding the best semantic similarity measures for assessing the similarity between human disease phenotypes, constructing biomedical entity functional similarity networks for predicting disease-associated biomarkers, and performing enrichment analysis on a set of similar phenotypes. Conclusions Taken together, UFO is expected to be a tool where biomedical ontologies can be exploited for various biomedical applications. Availability UFO is distributed as a Cytoscape app, and can be downloaded freely at Cytoscape App (http://apps.cytoscape.org/apps/ufo) for non-commercial use
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
5
|
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform 2017; 18:886-901. [PMID: 27473066 DOI: 10.1093/bib/bbw067] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Indexed: 01/02/2023] Open
Abstract
Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.
Collapse
|
6
|
Safaei A, Rezaei Tavirani M, Zamanian Azodi M, Lashay A, Mohammadi SF, Ghasemi Broumand M, Peyvandi AA, Okhovatian F, Peyvandi H, Rostami Nejad M. Diabetic Retinopathy and Laser Therapy in Rats: A Protein-Protein Interaction Network Analysis. J Lasers Med Sci 2017; 8:S20-S21. [PMID: 29071030 DOI: 10.15171/jlms.2017.s4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Introduction: Diabetic retinopathy (DR) is a serious microvascular complication of diabetes which can cause vision loss or blindness ultimately. Non enzymatic glycation of proteins leads to advanced glycation end products (AGEs) in DR. Since laser therapy is a well-established method, in this study, protein-protein interaction (PPI) network is applied for protein targets in DR disease in rats treated by laser. Methods: In this study, we focused on articles that investigated and compared the proteome profiles of DR rats with healthy control and also DR rats before and after laser therapy. The networks of related differentially expressed proteins were explored using Cytoscape version 3.3.0, the PPI analysis methods and ClueGO. Results: Analysis of PPI network of 37 related proteins to DR rats including 108 nodes, introduced 10 hub-bottleneck proteins and 5 concerned biochemical pathways. On the other hand, PPI analysis of related proteins to DR rats before and after laser therapy corresponded to 33 proteins and 2 biological pathways. Discussion: Centrality and cluster screening identified hub-bottelneck genes, including Aldoa, HSPD1, Pgam2, Mapk3, SLC2A4, Ctnnb1, Ywhab, HSPA8, GAPDH and Actb for DR rats versus healthy control and ENO1, Aldoa, GAPDH for DR samples after laser therapy. CONCLUSION Gene expression analysis of the DR samples treated via laser therapy provides a molecular evidence in support of the therapeutic effect of laser.
Collapse
Affiliation(s)
- Akram Safaei
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Mona Zamanian Azodi
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Alireza Lashay
- Eye Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyed Farzad Mohammadi
- Eye Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Mohamad Ghasemi Broumand
- Physiotherapy Research Centre, School of Rehabilitation, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Asghar Peyvandi
- Hearing Disorder Research Center, Shahid Behshti University of Medical Sciences, Tehran, Iran
| | - Farshad Okhovatian
- Physiotherapy Research Centre, School of Rehabilitation, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hassan Peyvandi
- Hearing Disorder Research Center, Shahid Behshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Rostami Nejad
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
7
|
Shen F, Sohn S, Rastegar-Mojarad M, Liu S, Pankratz JJ, Hatton MA, Sowada N, Shrestha OK, Shurson SL, Liu H. Populating Physician Biographical Pages Based on EMR Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:522-530. [PMID: 28815152 PMCID: PMC5543344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The physicians' biographical pages are essential in providing information about physicians' specialties. However, physicians may not have biographical pages or the current pages are not comprehensive. We hypothesize that physicians' specialty information can be mined from Electronic Medical Records (EMRs) of their patients. We proposed an automated physician specialty populating (PSP) system that analyzes physician-ascertained diagnoses in EMRs, aggregates them to an appropriate granularity based on the current biographical pages, and populates the biographical pages accordingly. In this study, we applied the system using EMR data from Mayo Clinic and evaluated the system using the current biographical pages regarding various ranking strategies. Preliminary results demonstrated that using EMR data is a scalable and systematic way to populate physicians' biographical pages.
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Rochester, MN, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Rochester, MN, USA
| | | | - Sijia Liu
- Department of Health Sciences Research, Rochester, MN, USA
| | | | | | | | | | | | - Hongfang Liu
- Department of Health Sciences Research, Rochester, MN, USA
| |
Collapse
|
8
|
Al-Dalky R, Taha K, Al Homouz D, Qasaimeh M. Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:494-504. [PMID: 26415184 DOI: 10.1109/tcbb.2015.2481399] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g , the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g . It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.
Collapse
|
9
|
Safaei A, Rezaei Tavirani M, Arefi Oskouei A, Zamanian Azodi M, Mohebbi SR, Nikzamir AR. Protein-protein interaction network analysis of cirrhosis liver disease. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2016; 9:114-23. [PMID: 27099671 PMCID: PMC4833850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
AIM Evaluation of biological characteristics of 13 identified proteins of patients with cirrhotic liver disease is the main aim of this research. BACKGROUND In clinical usage, liver biopsy remains the gold standard for diagnosis of hepatic fibrosis. Evaluation and confirmation of liver fibrosis stages and severity of chronic diseases require a precise and noninvasive biomarkers. Since the early detection of cirrhosis is a clinical problem, achieving a sensitive, specific and predictive novel method based on biomarkers is an important task. METHODS Essential analysis, such as gene ontology (GO) enrichment and protein-protein interactions (PPI) was undergone EXPASy, STRING Database and DAVID Bioinformatics Resources query. RESULTS Based on GO analysis, most of proteins are located in the endoplasmic reticulum lumen, intracellular organelle lumen, membrane-enclosed lumen, and extracellular region. The relevant molecular functions are actin binding, metal ion binding, cation binding and ion binding. Cell adhesion, biological adhesion, cellular amino acid derivative, metabolic process and homeostatic process are the related processes. Protein-protein interaction network analysis introduced five proteins (fibroblast growth factor receptor 4, tropomyosin 4, tropomyosin 2 (beta), lectin, Lectin galactoside-binding soluble 3 binding protein and apolipoprotein A-I) as hub and bottleneck proteins. CONCLUSION Our result indicates that regulation of lipid metabolism and cell survival are important biological processes involved in cirrhosis disease. More investigation of above mentioned proteins will provide a better understanding of cirrhosis disease.
Collapse
Affiliation(s)
- Akram Safaei
- Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Afsaneh Arefi Oskouei
- Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mona Zamanian Azodi
- Proteomic Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Reza Mohebbi
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Abdol Rahim Nikzamir
- Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
Taha K, Homouz D, Al Muhairi H, Al Mahmoud Z. GRank: a middleware search engine for ranking genes by relevance to given genes. BMC Bioinformatics 2013; 14:251. [PMID: 23957362 PMCID: PMC3765412 DOI: 10.1186/1471-2105-14-251] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2013] [Accepted: 08/13/2013] [Indexed: 11/24/2022] Open
Abstract
Background Biologists may need to know the set of genes that are semantically related to a given set of genes. For instance, a biologist may need to know the set of genes related to another set of genes known to be involved in a specific disease. Some works use the concept of gene clustering in order to identify semantically related genes. Others propose tools that return the set of genes that are semantically related to a given set of genes. Most of these gene similarity measures determine the semantic similarities among the genes based solely on the proximity to each other of the GO terms annotating the genes, while overlook the structural dependencies among these GO terms, which may lead to low recall and precision of results. Results We propose in this paper a search engine called GRank, which overcomes the limitations of the current gene similarity measures outlined above as follows. It employs the concept of existence dependency to determine the structural dependencies among the GO terms annotating a given set of gene. After determining the set of genes that are semantically related to input genes, GRank would use microarray experiment to rank these genes based on their degree of relativity to the input genes. We evaluated GRank experimentally and compared it with a comparable gene prediction tool called DynGO, which retrieves the genes and gene products that are relatives of input genes. Results showed marked improvement. Conclusions The experimental results demonstrated that GRank overcomes the limitations of current gene similarity measures. We attribute this performance to GRank’s use of existence dependency concept for determining the semantic relationships among gene annotations. The recall and precision values for two benchmarking datasets showed that GRank outperforms DynGO tool, which does not employ the concept of existence dependency. The demo of GRank using 11000 KEGG yeast genes and a Gene Expression Omnibus (GEO) microarray file named “GSM34635.pad” is available at: http://ecesrvr.kustar.ac.ae:8080/ (click on the link labelled Gene Ontology 2).
Collapse
Affiliation(s)
- Kamal Taha
- Department of Electrical and Computer Engineering, Khalifa University, Abu Dhabi, UAE.
| | | | | | | |
Collapse
|
11
|
Khan M, Vaes E, Mombaerts P. Temporal patterns of odorant receptor gene expression in adult and aged mice. Mol Cell Neurosci 2013; 57:120-9. [PMID: 23962816 DOI: 10.1016/j.mcn.2013.08.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 08/05/2013] [Accepted: 08/09/2013] [Indexed: 01/27/2023] Open
Abstract
In the mouse, the sense of smell relies predominantly on the expression of ~1200 odorant receptor (OR) genes in the main olfactory epithelium (MOE). Each mature olfactory sensory neuron (OSN) in the MOE is thought to express just one of these OR genes; conversely, an OR gene is expressed in thousands to tens of thousands of OSNs per mouse. Here, we have characterized temporal patterns of OR gene expression in a cohort of inbred C57BL6/N mice from the Aged Rodent Colonies of the National Institute on Aging. We applied the NanoString multiplex platform to quantify RNA abundance for 531 OR genes in whole olfactory mucosa (WOM) tissue samples. The five study groups were females aged 2, 6, 12, 18, and 31 months (mo). We classified the 531 temporal patterns using a step-down quadratic regression method for time course analysis. The majority of OR genes (58.4%) are classified as flat: there is no significant difference from a horizontal line within this time window. There are 32.8% of OR genes with a downward profile, 7.2% with an upward profile, and 1.7% with a convex or concave profile. But the magnitude of these decreases and increases tends to be small: only 4.3% of OR genes are differentially expressed (DE) at 31 mo compared to 2 mo. Interestingly, the variances of NanoString counts for individual OR genes are homogeneous among the age groups. Our analyses of these 15,930 OR gene expression data of C57BL6/N mice that were raised and housed under well-controlled conditions indicate that OR gene expression at the MOE level is intrinsically stable.
Collapse
Affiliation(s)
- Mona Khan
- Max Planck Research Unit for Molecular Neurogenetics, 60438 Frankfurt, Germany
| | | | | |
Collapse
|
12
|
Jupp S, Stevens R, Hoehndorf R. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL. J Biomed Semantics 2012; 3 Suppl 1:S3. [PMID: 22541594 PMCID: PMC3337258 DOI: 10.1186/2041-1480-3-s1-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. RESULTS To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. CONCLUSION This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse. AVAILABILITY The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
13
|
Mao KZ, Tang W. Recursive Mahalanobis separability measure for gene subset selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:266-272. [PMID: 20479500 DOI: 10.1109/tcbb.2010.43] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Mahalanobis class separability measure provides an effective evaluation of the discriminative power of a feature subset, and is widely used in feature selection. However, this measure is computationally intensive or even prohibitive when it is applied to gene expression data. In this study, a recursive approach to Mahalanobis measure evaluation is proposed, with the goal of reducing computational overhead. Instead of evaluating Mahalanobis measure directly in high-dimensional space, the recursive approach evaluates the measure through successive evaluations in 2D space. Because of its recursive nature, this approach is extremely efficient when it is combined with a forward search procedure. In addition, it is noted that gene subsets selected by Mahalanobis measure tend to overfit training data and generalize unsatisfactorily on unseen test data, due to small sample size in gene expression problems. To alleviate the overfitting problem, a regularized recursive Mahalanobis measure is proposed in this study, and guidelines on determination of regularization parameters are provided. Experimental studies on five gene expression problems show that the regularized recursive Mahalanobis measure substantially outperforms the nonregularized Mahalanobis measures and the benchmark recursive feature elimination (RFE) algorithm in all five problems.
Collapse
Affiliation(s)
- K Z Mao
- School of Electrical and Electronic Engineering, Block S2.1, Nanyang Technological University, Singapore 639798.
| | | |
Collapse
|
14
|
Montecchi-Palazzi L, Kerrien S, Reisinger F, Aranda B, Jones AR, Martens L, Hermjakob H. The PSI semantic validator: a framework to check MIAPE compliance of proteomics data. Proteomics 2010; 9:5112-9. [PMID: 19834897 DOI: 10.1002/pmic.200900189] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio-ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator.
Collapse
Affiliation(s)
- Luisa Montecchi-Palazzi
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
| | | | | | | | | | | | | |
Collapse
|
15
|
Baker EJ, Jay JJ, Philip VM, Zhang Y, Li Z, Kirova R, Langston MA, Chesler EJ. Ontological Discovery Environment: a system for integrating gene-phenotype associations. Genomics 2009; 94:377-87. [PMID: 19733230 DOI: 10.1016/j.ygeno.2009.08.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 08/19/2009] [Accepted: 08/27/2009] [Indexed: 10/20/2022]
Abstract
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental systems or species domain.
Collapse
Affiliation(s)
- Erich J Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Abstract
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
Collapse
Affiliation(s)
- Catia Pesquita
- LaSIGE, Faculty of Sciences, University of Lisboa, Lisboa, Portugal.
| | | | | | | | | |
Collapse
|
17
|
Abstract
Abstraction of intracellular biomolecular interactions into networks is useful for data integration and graph analysis. Network analysis tools facilitate predictions of novel functions for proteins, prediction of functional interactions and identification of intracellular modules. These efforts are linked with drug and phenotype data to accelerate drug-target and biomarker discovery. This review highlights the currently available varieties of mammalian biomolecular networks, and surveys methods and tools to construct, compare, integrate, visualise and analyse such networks.
Collapse
Affiliation(s)
- A Ma'ayan
- Mount Sinai School of Medicine, Department of Pharmacology and Systems Therapeutics, New York, NY 10029-6574, USA.
| |
Collapse
|
18
|
Topalis P, Tzavlaki C, Vestaki K, Dialynas E, Sonenshine DE, Butler R, Bruggner RV, Stinson EO, Collins FH, Louis C. Anatomical ontologies of mosquitoes and ticks, and their web browsers in VectorBase. INSECT MOLECULAR BIOLOGY 2008; 17:87-89. [PMID: 18237287 DOI: 10.1111/j.1365-2583.2008.00781.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
VectorBase, an integrated, relational database that manages genomic and other genetic/biological data pertaining to arthropod vectors of disease, has recently embarked on the construction of ontologies and controlled vocabularies (CVs). It aims, thus, at providing all necessary tools for the complete annotation of vector genomes and, in particular, the annotation of functional genomic data. This task was initiated with the development of anatomical ontologies of mosquitoes and ticks, both of which were made compliant to CARO, the common anatomy reference ontology. The ontologies are complemented by the development of novel web-based browsers that can show figures for anatomical terms, something that is especially helpful for fully illustrating the controlled vocabularies of anatomy.
Collapse
Affiliation(s)
- P Topalis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, Heraklion, Crete, Greece
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services.
Collapse
Affiliation(s)
- Andreas Schlicker
- Max Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany.
| | | |
Collapse
|
20
|
Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, Lempicki RA. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 2007; 35:W169-75. [PMID: 17576678 PMCID: PMC1933169 DOI: 10.1093/nar/gkm415] [Citation(s) in RCA: 1667] [Impact Index Per Article: 92.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2007] [Revised: 04/14/2007] [Accepted: 05/06/2007] [Indexed: 12/20/2022] Open
Abstract
All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies. The newly updated DAVID Bioinformatics Resources consists of the DAVID Knowledgebase and five integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification Tool, the DAVID Functional Annotation Tool, the DAVID Gene ID Conversion Tool, the DAVID Gene Name Viewer and the DAVID NIAID Pathogen Genome Browser. The expanded DAVID Knowledgebase now integrates almost all major and well-known public bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a variety of public bioinformatics databases. For any uploaded gene list, the DAVID Resources now provides not only the typical gene-term enrichment analysis, but also new tools and functions that allow users to condense large gene lists into gene functional groups, convert between gene/protein identifiers, visualize many-genes-to-many-terms relationships, cluster redundant and heterogeneous terms into groups, search for interesting and related genes or terms, dynamically view genes from their lists on bio-pathways and more. With DAVID (http://david.niaid.nih.gov), investigators gain more power to interpret the biological mechanisms associated with large gene lists.
Collapse
Affiliation(s)
- Da Wei Huang
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Brad T. Sherman
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Qina Tan
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Joseph Kir
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Liu
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Bryant
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yongjian Guo
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Robert Stephens
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Michael W. Baseler
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - H. Clifford Lane
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Richard A. Lempicki
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
21
|
Othman RM, Deris S, Illias RM. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. J Biomed Inform 2007; 41:65-81. [PMID: 17681495 DOI: 10.1016/j.jbi.2007.05.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Revised: 05/28/2007] [Accepted: 05/29/2007] [Indexed: 11/19/2022]
Abstract
A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.
Collapse
Affiliation(s)
- Razib M Othman
- Department of Software Engineering, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia.
| | | | | |
Collapse
|
22
|
Schlicker A, Rahnenführer J, Albrecht M, Lengauer T, Domingues FS. GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol 2007; 8:R33. [PMID: 17346342 PMCID: PMC1868936 DOI: 10.1186/gb-2007-8-3-r33] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2006] [Revised: 01/18/2007] [Accepted: 03/08/2007] [Indexed: 11/10/2022] Open
Abstract
We describe GOTax, a comparative genomics platform that integrates protein annotation with protein family classification and taxonomy. User-defined sets of proteins, protein families, annotation terms or taxonomic groups can be selected and compared, allowing for the analysis of distribution of biological processes and molecular activities over different taxonomic groups. In particular, a measure of functional similarity is available for comparing proteins and protein families, establishing functional relationships independent of evolution.
Collapse
Affiliation(s)
- Andreas Schlicker
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| | - Jörg Rahnenführer
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| | - Mario Albrecht
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| | - Thomas Lengauer
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| | - Francisco S Domingues
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| |
Collapse
|
23
|
Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 2007; 8:R183. [PMID: 17784955 PMCID: PMC2375021 DOI: 10.1186/gb-2007-8-9-r183] [Citation(s) in RCA: 1784] [Impact Index Per Article: 99.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Revised: 04/20/2007] [Accepted: 09/04/2007] [Indexed: 12/16/2022] Open
Abstract
The DAVID Gene Functional Classification Tool http://david.abcc.ncifcrf.gov uses a novel agglomeration algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules. This organization is accomplished by mining the complex biological co-occurrences found in multiple sources of functional annotation. It is a powerful method to group functionally related genes and terms into a manageable number of biological modules for efficient interpretation of gene lists in a network context.
Collapse
Affiliation(s)
- Da Wei Huang
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Brad T Sherman
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Qina Tan
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Jack R Collins
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - W Gregory Alvord
- Computer and Statistical Services, Data Management Services, National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Jean Roayaei
- Computer and Statistical Services, Data Management Services, National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Robert Stephens
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Michael W Baseler
- Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - H Clifford Lane
- Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Richard A Lempicki
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| |
Collapse
|
24
|
Sealfon RSG, Hibbs MA, Huttenhower C, Myers CL, Troyanskaya OG. GOLEM: an interactive graph-based gene-ontology navigation and analysis tool. BMC Bioinformatics 2006; 7:443. [PMID: 17032457 PMCID: PMC1618863 DOI: 10.1186/1471-2105-7-443] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 10/10/2006] [Indexed: 11/28/2022] Open
Abstract
Background The Gene Ontology has become an extremely useful tool for the analysis of genomic data and structuring of biological knowledge. Several excellent software tools for navigating the gene ontology have been developed. However, no existing system provides an interactively expandable graph-based view of the gene ontology hierarchy. Furthermore, most existing tools are web-based or require an Internet connection, will not load local annotations files, and provide either analysis or visualization functionality, but not both. Results To address the above limitations, we have developed GOLEM (Gene Ontology Local Exploration Map), a visualization and analysis tool for focused exploration of the gene ontology graph. GOLEM allows the user to dynamically expand and focus the local graph structure of the gene ontology hierarchy in the neighborhood of any chosen term. It also supports rapid analysis of an input list of genes to find enriched gene ontology terms. The GOLEM application permits the user either to utilize local gene ontology and annotations files in the absence of an Internet connection, or to access the most recent ontology and annotation information from the gene ontology webpage. GOLEM supports global and organism-specific searches by gene ontology term name, gene ontology id and gene name. Conclusion GOLEM is a useful software tool for biologists interested in visualizing the local directed acyclic graph structure of the gene ontology hierarchy and searching for gene ontology terms enriched in genes of interest. It is freely available both as an application and as an applet at .
Collapse
Affiliation(s)
- Rachel SG Sealfon
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
| | - Matthew A Hibbs
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Labs, Princeton, NJ, USA
| | - Curtis Huttenhower
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Labs, Princeton, NJ, USA
| | - Chad L Myers
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Labs, Princeton, NJ, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Labs, Princeton, NJ, USA
| |
Collapse
|
25
|
Sun H, Fang H, Chen T, Perkins R, Tong W. GOFFA: gene ontology for functional analysis--a FDA gene ontology tool for analysis of genomic and proteomic data. BMC Bioinformatics 2006; 7 Suppl 2:S23. [PMID: 17118145 PMCID: PMC1683576 DOI: 10.1186/1471-2105-7-s2-s23] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Gene Ontology (GO) characterizes and categorizes the functions of genes and their products according to biological processes, molecular functions and cellular components, facilitating interpretation of data from high-throughput genomics and proteomics technologies. The most effective use of GO information is achieved when its rich and hierarchical complexity is retained and the information is distilled to the biological functions that are most germane to the phenomenon being investigated. Results Here we present a FDA GO tool named Gene Ontology for Functional Analysis (GOFFA). GOFFA first ranks GO terms in the order of prevalence for a list of selected genes or proteins, and then it allows the user to interactively select GO terms according to their significance and specific biological complexity within the hierarchical structure. GOFFA provides five interactive functions (Tree view, Terms View, Genes View, GO Path and GO TreePrune) to analyze the GO data. Among the five functions, GO Path and GO TreePrune are unique. The GO Path simultaneously displays the ranks that order GOFFA Tree Paths based on statistical analysis. The GO TreePrune provides a visual display of a reduced GO term set based on a user's statistical cut-offs. Therefore, the GOFFA visual display can provide an intuitive depiction of the most likely relevant biological functions. Conclusion With GOFFA, the user can dynamically interact with the GO data to interpret gene expression results in the context of biological plausibility, which can lead to new discoveries or identify new hypotheses. Availability GOFFA is available through ArrayTrack software .
Collapse
Affiliation(s)
- Hongmei Sun
- Z-tech Corporation, 3900 NCTR Road, Jefferson, Arkansas, 72079 USA
| | - Hong Fang
- Z-tech Corporation, 3900 NCTR Road, Jefferson, Arkansas, 72079 USA
| | - Tao Chen
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas, 72079 USA
| | - Roger Perkins
- Z-tech Corporation, 3900 NCTR Road, Jefferson, Arkansas, 72079 USA
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas, 72079 USA
| |
Collapse
|
26
|
Zhang P, Zhang J, Sheng H, Russo JJ, Osborne B, Buetow K. Gene functional similarity search tool (GFSST). BMC Bioinformatics 2006; 7:135. [PMID: 16536867 PMCID: PMC1421445 DOI: 10.1186/1471-2105-7-135] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Accepted: 03/14/2006] [Indexed: 11/21/2022] Open
Abstract
Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic association studies, biomarker and drug target selection, and animal models of human diseases, it is essential to have search engines that can retrieve genes by their functions from proteome databases. In recent years, the development of Gene Ontology (GO) has established structured, controlled vocabularies describing gene functions, which makes it possible to develop novel tools to search genes by functional similarity. Results By using a statistical model to measure the functional similarity of genes based on the Gene Ontology directed acyclic graph, we developed a novel Gene Functional Similarity Search Tool (GFSST) to identify genes with related functions from annotated proteome databases. This search engine lets users design their search targets by gene functions. Conclusion An implementation of GFSST which works on the UniProt (Universal Protein Resource) for the human and mouse proteomes is available at GFSST Web Server. GFSST provides functions not only for similar gene retrieval but also for gene search by one or more GO terms. This represents a powerful new approach for selecting similar genes and gene products from proteome databases according to their functions.
Collapse
Affiliation(s)
- Peisen Zhang
- Laboratory of Population Genetics, National Cancer Institute, NIH, Bethesda, USA
| | - Jinghui Zhang
- Laboratory of Population Genetics, National Cancer Institute, NIH, Bethesda, USA
| | - Huitao Sheng
- Columbia Genome Center, Columbia University, New York, USA
| | - James J Russo
- Columbia Genome Center, Columbia University, New York, USA
| | | | - Kenneth Buetow
- Laboratory of Population Genetics, National Cancer Institute, NIH, Bethesda, USA
| |
Collapse
|