1
|
Wang G, Shen D, Zhang X, Ferrini MG, Li Y, Liao H. Comparison of critical biomarkers in 2 erectile dysfunction models based on GEO and NOS-cGMP-PDE5 pathway. Medicine (Baltimore) 2021; 100:e27508. [PMID: 34731136 PMCID: PMC8519209 DOI: 10.1097/md.0000000000027508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 09/25/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Erectile dysfunction is a disease commonly caused by diabetes mellitus (DMED) and cavernous nerve injury (CNIED). Bioinformatics analyses including differentially expressed genes (DEGs), enriched functions and pathways (EFPs), and protein-protein interaction (PPI) networks were carried out in DMED and CNIED rats in this study. The critical biomarkers that may intervene in nitric oxide synthase (NOS, predominantly nNOS, ancillary eNOS, and iNOS)-cyclic guanosine monophosphate (cGMP)-phosphodiesterase 5 enzyme (PDE5) pathway, an important mechanism in erectile dysfunction treatment, were then explored for potential clinical applications. METHODS GSE2457 and GSE31247 were downloaded. Their DEGs with a |logFC (fold change)| > 0 were screened out. Database for Annotation, Visualization and Integrated Discovery (DAVID) online database was used to analyze the EFPs in Gene Ontology enrichment and Kyoto Encyclopedia of Genes and Genomes networks based on down-regulated and up-regulated DEGs respectively. PPI analysis of 2 datasets was performed in Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and Cytoscape. Interactions with an average score greater than 0.9 were chosen as the cutoff for statistical significance. RESULTS From a total of 1710 DEGs in GSE2457, 772 were down-regulated and 938 were up-regulated, in contrast to the 836 DEGs in GSE31247, from which 508 were down-regulated and 328 were up-regulated. The 25 common EFPs such as aging and response to hormone were identified in both models. PPI results showed that the first 10 hub genes in DMED were all different from those in CNIED. CONCLUSIONS The intervention of iNOS with the hub gene complement component 3 in DMED and the aging process in both DMED and CNIED deserves attention.
Collapse
Affiliation(s)
- Guangying Wang
- Department of Pharmacy, Shanxi Provincial People's Hospital of Shanxi Medical University, Taiyuan, China
| | - Dayue Shen
- School of Pharmacy, Shanxi Medical University, Taiyuan, China
| | - Xilan Zhang
- School of Pharmacy, Shanxi Medical University, Taiyuan, China
| | - Monica G. Ferrini
- Department of Health and Life Sciences & Department of Internal Medicine, Charles R. Drew University, Los Angeles, CA
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA
| | - Yuanping Li
- Department of Pharmacy, Shanxi Provincial People's Hospital of Shanxi Medical University, Taiyuan, China
| | - Hui Liao
- Department of Pharmacy, Shanxi Provincial People's Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
2
|
Vega Yon GG, Thomas DC, Morrison J, Mi H, Thomas PD, Marjoram P. Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees. PLoS Comput Biol 2021; 17:e1007948. [PMID: 33600408 PMCID: PMC7924801 DOI: 10.1371/journal.pcbi.1007948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 03/02/2021] [Accepted: 12/30/2020] [Indexed: 11/29/2022] Open
Abstract
Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.
Collapse
Affiliation(s)
- George G. Vega Yon
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Duncan C. Thomas
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - John Morrison
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Paul D. Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Paul Marjoram
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
3
|
Abstract
Acute respiratory distress syndrome (ARDS) is characterized as a neutrophil-dominant disorder without effective pharmacological interventions. Knowledge of neutrophils in ARDS patients at the transcriptome level is still limited. We aimed to identify the hub genes and key pathways in neutrophils of patients with ARDS. The transcriptional profiles of neutrophils from ARDS patients and healthy volunteers were obtained from the GSE76293 dataset. The differentially expressed genes (DEGs) between ARDS and healthy samples were screened using the limma R package. Subsequently, functional and pathway enrichment analyses were performed based on the database for annotation, visualization, and integrated discovery (DAVID). The construction of a protein-protein interaction network was carried out using the search tool for the retrieval of interacting genes (STRING) database and the network was visualized by Cytoscape software. The Cytoscape plugins cytoHubba and MCODE were used to identify hub genes and significant modules. Finally, 136 upregulated genes and 95 downregulated genes were identified. Gene ontology analyses revealed MHC class II plays a major role in functional annotations. SLC11A1, ARG1, CHI3L1, HP, LCN2, and MMP8 were identified as hub genes, and they were all involved in the neutrophil degranulation pathway. The MAPK and neutrophil degranulation pathways in neutrophils were considered as key pathways in the pathogenesis of ARDS. This study improves our understanding of the biological characteristics of neutrophils and the mechanisms underlying ARDS, and key pathways and hub genes identified in this work can serve as targets for novel ARDS treatment strategies.
Collapse
Affiliation(s)
- Lan Hu
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
- Department of Outpatient, Children's Hospital of Chongqing Medical University, Chongqing, PR China
| | - Tianxin Zhao
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
| | - Yuelin Sun
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
| | - Yingfu Chen
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
| | - Ke Bai
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
| | - Feng Xu
- Department of Intensive Care Unit, Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation base of Child development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics
| |
Collapse
|
4
|
Liu Z, Li S, Li W, Liu Q, Zhang L, Song X. Comparative transcriptome analysis indicates that a core transcriptional network mediates isonuclear alloplasmic male sterility in wheat (Triticum aestivum L.). BMC Plant Biol 2020; 20:10. [PMID: 31910796 PMCID: PMC6947873 DOI: 10.1186/s12870-019-2196-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 12/10/2019] [Indexed: 05/12/2023]
Abstract
BACKGROUND Cytoplasmic male sterility (CMS) plays a crucial role in the utilization of heterosis and various types of CMS often have different abortion mechanisms. Therefore, it is important to understand the molecular mechanisms related to anther abortion in wheat, which remain unclear at present. RESULTS In this study, five isonuclear alloplasmic male sterile lines (IAMSLs) and their maintainer were investigated. Cytological analysis indicated that the abortion type was identical in IAMSLs, typical and stainable abortion, and the key abortive period was in the binucleate stage. Most of the 1,281 core shared differentially expressed genes identified by transcriptome sequencing compared with the maintainer in the vital abortive stage were involved in the metabolism of sugars, oxidative phosphorylation, phenylpropane biosynthesis, and phosphatidylinositol signaling, and they were downregulated in the IAMSLs. Key candidate genes encoding chalcone--flavonone isomerase, pectinesterase, and UDP-glucose pyrophosphorylase were screened and identified. Moreover, further verification elucidated that due to the impact of downregulated genes in these pathways, the male sterile anthers were deficient in sugar and energy, with excessive accumulations of ROS, blocked sporopollenin synthesis, and abnormal tapetum degradation. CONCLUSIONS Through comparative transcriptome analysis, an intriguing core transcriptome-mediated male-sterility network was proposed and constructed for wheat and inferred that the downregulation of genes in important pathways may ultimately stunt the formation of the pollen outer wall in IAMSLs. These findings provide insights for predicting the functions of the candidate genes, and the comprehensive analysis of our results was helpful for studying the abortive interaction mechanism in CMS wheat.
Collapse
Affiliation(s)
- Zihan Liu
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| | - Sha Li
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| | - Wei Li
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| | - Qi Liu
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| | - Lingli Zhang
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| | - Xiyue Song
- College of Agronomy, Northwest A&F University, Yangling, Shaanxi China
| |
Collapse
|
5
|
Trost N, Rempel E, Ermakova O, Tamirisa S, Pârcălăbescu L, Boutros M, Lohmann JU, Lohmann I. WEADE: A workflow for enrichment analysis and data exploration. PLoS One 2018; 13:e0204016. [PMID: 30265728 PMCID: PMC6161842 DOI: 10.1371/journal.pone.0204016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 08/30/2018] [Indexed: 11/18/2022] Open
Abstract
Data analysis based on enrichment of Gene Ontology terms has become an important step in exploring large gene or protein expression datasets and several stand-alone or web tools exist for that purpose. However, a comprehensive and consistent analysis downstream of the enrichment calculation is missing so far. With WEADE we present a free web application that offers an integrated workflow for the exploration of genomic data combining enrichment analysis with a versatile set of tools to directly compare and intersect experiments or candidate gene lists of any size or origin including cross-species data. Lastly, WEADE supports the graphical representation of output data in the form of functional interaction networks based on prior knowledge, allowing users to go from plain expression data to functionally relevant candidate sub-lists in an interactive and consistent manner.
Collapse
Affiliation(s)
- Nils Trost
- Centre for Organismal Studies (COS), Heidelberg, Germany
| | - Eugen Rempel
- Centre for Organismal Studies (COS), Heidelberg, Germany
| | - Olga Ermakova
- Centre for Organismal Studies (COS), Heidelberg, Germany
| | | | | | | | - Jan U. Lohmann
- Centre for Organismal Studies (COS), Heidelberg, Germany
| | - Ingrid Lohmann
- Centre for Organismal Studies (COS), Heidelberg, Germany
- * E-mail:
| |
Collapse
|
6
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
7
|
Lopez C, Tucker S, Salameh T, Tucker C. An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J Biomed Inform 2018; 85:30-39. [PMID: 30016722 PMCID: PMC6621561 DOI: 10.1016/j.jbi.2018.07.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/22/2018] [Accepted: 07/07/2018] [Indexed: 01/04/2023]
Abstract
INTRODUCTION Many chronic disorders have genomic etiology, disease progression, clinical presentation, and response to treatment that vary on a patient-to-patient basis. Such variability creates a need to identify characteristics within patient populations that have clinically relevant predictive value in order to advance personalized medicine. Unsupervised machine learning methods are suitable to address this type of problem, in which no a priori class label information is available to guide this search. However, it is challenging for existing methods to identify cluster memberships that are not just a result of natural sampling variation. Moreover, most of the current methods require researchers to provide specific input parameters a priori. METHOD This work presents an unsupervised machine learning method to cluster patients based on their genomic makeup without providing input parameters a priori. The method implements internal validity metrics to algorithmically identify the number of clusters, as well as statistical analyses to test for the significance of the results. Furthermore, the method takes advantage of the high degree of linkage disequilibrium between single nucleotide polymorphisms. Finally, a gene pathway analysis is performed to identify potential relationships between the clusters in the context of known biological knowledge. DATASETS AND RESULTS The method is tested with a cluster validation and a genomic dataset previously used in the literature. Benchmark results indicate that the proposed method provides the greatest performance out of the methods tested. Furthermore, the method is implemented on a sample genome-wide study dataset of 191 multiple sclerosis patients. The results indicate that the method was able to identify genetically distinct patient clusters without the need to select parameters a priori. Additionally, variants identified as significantly different between clusters are shown to be enriched for protein-protein interactions, especially in immune processes and cell adhesion pathways, via Gene Ontology term analysis. CONCLUSION Once links are drawn between clusters and clinically relevant outcomes, Immunochip data can be used to classify high-risk and newly diagnosed chronic disease patients into known clusters for predictive value. Further investigation can extend beyond pathway analysis to evaluate these clusters for clinical significance of genetically related characteristics such as age of onset, disease course, heritability, and response to treatment.
Collapse
Affiliation(s)
- Christian Lopez
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott Tucker
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA; Engineering Science and Mechanics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tarik Salameh
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA
| | - Conrad Tucker
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA; Engineering Design Technology and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
8
|
Abstract
Long noncoding RNAs (LncRNAs) were important genes involved in a variety of biological functions. They are aberrantly expressed in many types of diseases. In this study, we described LncRNAs profiles in 4 pairs of human brain arteriovenous malformation(AVM) and the corresponding fragment of superior temporal arteries(STA) or small scalp arteries (controlled arteries, CA) and try to find LncRNAs that correlated with the human brain AVM and with clinical symptoms.4 pairs of AVM tissues and corresponding STA or scalp artery fragments (depended on the operative approach) of 4 AVM patients who were admitted in Beijing TianTan hospital were collected. Then LncRNA and mRNA expression profiling analysis was performed by Arraystar-LncRNA array. From the data, we found 1931 LncRNAs upregulated (>2 folds) and 1852 downregulated (<2 folds) in total 28,012 LncRNAs that could be detected. We also found 1577 upregulated mRNAs (>2 folds) and 1699 downregulated (<2 folds) in 21,780 mRNAs that could be detected. LncRNAs (ENST00000423394, ENST00000444114, TCONS_00013855, and ENST00000452148) were evaluated by qPCR in 14 pairs of AVM nidus and the control. This 4 LncRNAs were aberrantly expressed in AVM nidus compared with the control. LncRNA (ENST00000423394) correlated with epilepsy (R = 0.34, P = .02, 95% confidence interval 0.08-0.85)We found that development of AVM may correspond with downregulation of NADPH reductase, lipoprotein lipase and Optic atrophy related proteins. It also may correspond with upregulation of Fcγreceptor. The downregulation of NADPH reductase may correlate with seizures of AVM patients.
Collapse
Affiliation(s)
- Xiong Li
- Department of Neurosurgery, Beijing ChaoYang Hospital
| | - FuXin Lin
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University
| | - Jun Wu
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University
| | - Shuo Wang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University
| |
Collapse
|
9
|
Berghout J, Li Q, Pouladi N, Li J, Lussier YA. Single subject transcriptome analysis to identify functionally signed gene set or pathway activity. Pac Symp Biocomput 2018; 23:400-411. [PMID: 29218900 PMCID: PMC5730358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Analysis of single-subject transcriptome response data is an unmet need of precision medicine, made challenging by the high dimension, dynamic nature and difficulty in extracting meaningful signals from biological or stochastic noise. We have proposed a method for single subject analysis that uses a mixture model for transcript fold-change clustering from isogenically paired samples, followed by integration of these distributions with Gene Ontology Biological Processes (GO-BP) to reduce dimension and identify functional attributes. We then extended these methods to develop functional signing metrics for gene set process regulation by incorporating biological repressor relationships encoded in GO-BP as negatively_regulates edges. Results revealed reproducible and biologically meaningful signals from analysis of a single subject's response, opening the door to future transcriptomic studies where subject and resource availability are currently limiting. We used inbred mouse strains fed different diets to provide isogenic biological replicates, permitting rigorous validation of our method. We compared significant genotype-specific GO-BP term results for overlap and rank order across three replicate pairs per genotype, and cross-methods to reference standards (limma+FET, SAM+FET, and GSEA). All single-subject analytics findings were robust and highly reproducible (median area under the ROC curve=0.96, n=24 genotypes × 3 replicates), providing confidence and validation of this approach for analyses in single subjects. R code is available online at http://www.lussiergroup.org/publications/PathwayActivity.
Collapse
Affiliation(s)
- Joanne Berghout
- Center for Biomedical Informatics and Biostatistics (CB2) & The Center for Applied Genetics and Genomic Medicine, Department of Medicine, University of Arizona, Tucson, AZ 85721, USA,
| | | | | | | | | |
Collapse
|
10
|
Johnson TS, Li S, Kho JR, Huang K, Zhang Y. Network analysis of pseudogene-gene relationships: from pseudogene evolution to their functional potentials. Pac Symp Biocomput 2018; 23:536-547. [PMID: 29218912 PMCID: PMC5744670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Pseudogenes are fossil relatives of genes. Pseudogenes have long been thought of as "junk DNAs", since they do not code proteins in normal tissues. Although most of the human pseudogenes do not have noticeable functions, ∼20% of them exhibit transcriptional activity. There has been evidence showing that some pseudogenes adopted functions as lncRNAs and work as regulators of gene expression. Furthermore, pseudogenes can even be "reactivated" in some conditions, such as cancer initiation. Some pseudogenes are transcribed in specific cancer types, and some are even translated into proteins as observed in several cancer cell lines. All the above have shown that pseudogenes could have functional roles or potentials in the genome. Evaluating the relationships between pseudogenes and their gene counterparts could help us reveal the evolutionary path of pseudogenes and associate pseudogenes with functional potentials. It also provides an insight into the regulatory networks involving pseudogenes with transcriptional and even translational activities.In this study, we develop a novel approach integrating graph analysis, sequence alignment and functional analysis to evaluate pseudogene-gene relationships, and apply it to human gene homologs and pseudogenes. We generated a comprehensive set of 445 pseudogene-gene (PGG) families from the original 3,281 gene families (13.56%). Of these 438 (98.4% PGG, 13.3% total) were non-trivial (containing more than one pseudogene). Each PGG family contains multiple genes and pseudogenes with high sequence similarity. For each family, we generate a sequence alignment network and phylogenetic trees recapitulating the evolutionary paths. We find evidence supporting the evolution history of olfactory family (both genes and pseudogenes) in human, which also supports the validity of our analysis method. Next, we evaluate these networks in respect to the gene ontology from which we identify functions enriched in these pseudogene-gene families and infer functional impact of pseudogenes involved in the networks. This demonstrates the application of our PGG network database in the study of pseudogene function in disease context.
Collapse
Affiliation(s)
- Travis S Johnson
- Dept. Biomedical Informatics, Ohio State University, 5000 HITS, 410 W. 10th St. Indianapolis, Indiana, 46202, USA,
| | | | | | | | | |
Collapse
|
11
|
Harrington LX, Way GP, Doherty JA, Greene CS. Functional network community detection can disaggregate and filter multiple underlying pathways in enrichment analyses. Pac Symp Biocomput 2018; 23:157-167. [PMID: 29218878 PMCID: PMC5760988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Differential expression experiments or other analyses often end in a list of genes. Pathway enrichment analysis is one method to discern important biological signals and patterns from noisy expression data. However, pathway enrichment analysis may perform suboptimally in situations where there are multiple implicated pathways - such as in the case of genes that define subtypes of complex diseases. Our simulation study shows that in this setting, standard overrepresentation analysis identifies many false positive pathways along with the true positives. These false positives hamper investigators' attempts to glean biological insights from enrichment analysis. We develop and evaluate an approach that combines community detection over functional networks with pathway enrichment to reduce false positives. Our simulation study demonstrates that a large reduction in false positives can be obtained with a small decrease in power. Though we hypothesized that multiple communities might underlie previously described subtypes of high-grade serous ovarian cancer and applied this approach, our results do not support this hypothesis. In summary, applying community detection before enrichment analysis may ease interpretation for complex gene sets that represent multiple distinct pathways.
Collapse
Affiliation(s)
- Lia X Harrington
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover 03784, USA,
| | | | | | | |
Collapse
|
12
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Programs Biomed 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|
13
|
Nixon SE, González-Peña D, Lawson MA, McCusker RH, Hernandez AG, O’Connor JC, Dantzer R, Kelley KW, Rodriguez-Zas SL. Analytical workflow profiling gene expression in murine macrophages. J Bioinform Comput Biol 2015; 13:1550010. [PMID: 25708305 PMCID: PMC4539142 DOI: 10.1142/s0219720015500109] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq.
Collapse
Affiliation(s)
- Scott E. Nixon
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Dianelys González-Peña
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Marcus A. Lawson
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Robert H. McCusker
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Alvaro G. Hernandez
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jason C. O’Connor
- Department of Pharmacology, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Robert Dantzer
- Department of Symptom Research, University of Texas M. D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Keith W. Kelley
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Sandra L. Rodriguez-Zas
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Statistics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
14
|
Glicksberg BS, Li L, Cheng WY, Shameer K, Hakenberg J, Castellanos R, Ma M, Shi L, Shah H, Dudley JT, Chen R. An integrative pipeline for multi-modal discovery of disease relationships. Pac Symp Biocomput 2015; 20:407-18. [PMID: 25592600 PMCID: PMC4345399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In the past decade there has been an explosion in genetic research that has resulted in the generation of enormous quantities of disease-related data. In the current study, we have compiled disease risk gene variant information and Electronic Medical Record (EMR) classification codes from various repositories for 305 diseases. Using such data, we developed a pipeline to test for clinical prevalence, gene-variant overlap, and literature presence for all 46,360 unique diseases pairs. To determine whether disease pairs were enriched we systematically employed both Fishers' Exact (medical and literature) and Term Frequency-Inverse Document Frequency (genetics) methodologies to test for enrichment, defining statistical significance at a Bonferonni adjusted threshold of (p < 1 × 10(-6)) and weighted q < 0.05 accordingly. We hypothesize that disease pairs that are statistically enriched in medical and genetic spheres, but not so in the literature have the potential to reveal non-obvious connections between clinically disparate phenotypes. Using this pipeline, we identified 2,316 disease pairs that were significantly enriched within an EMR and 213 enriched genetically. Of these, 65 disease pairs were statistically enriched in both, 19 of which are believed to be novel. These identified non-obvious relationships between disease pairs are suggestive of a shared underlying etiology with clinical presentation. Further investigation of uncovered disease-pair relationships has the potential to provide insights into the architecture of complex diseases, and update existing knowledge of risk factors.
Collapse
Affiliation(s)
- Benjamin S. Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Li Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Wei-Yi Cheng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Khader Shameer
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Jörg Hakenberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Rafael Castellanos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Meng Ma
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Lisong Shi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Hardik Shah
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Joel T. Dudley
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| | - Rong Chen
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York City, NY 10029, USA
| |
Collapse
|
15
|
Vembu S, Morris Q. An efficient algorithm to integrate network and attribute data for gene function prediction. Pac Symp Biocomput 2014:388-399. [PMID: 24297564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Label propagation methods are extremely well-suited for a variety of biomedical prediction tasks based on network data. However, these algorithms cannot be used to integrate feature-based data sources with networks. We propose an efficient learning algorithm to integrate these two types of heterogeneous data sources to perform binary prediction tasks on node features (e.g., gene prioritization, disease gene prediction). Our method, LMGraph, consists of two steps. In the first step, we extract a small set of "network features" from the nodes of networks that represent connectivity with labeled nodes in the prediction tasks. In the second step, we apply a simple weighting scheme in conjunction with linear classifiers to combine these network features with other feature data. This two-step procedure allows us to (i) learn highly scalable and computationally efficient linear classifiers, (ii) and seamlessly combine feature-based data sources with networks. Our method is much faster than label propagation which is already known to be computationally efficient on large-scale prediction problems. Experiments on multiple functional interaction networks from three species (mouse, y, C.elegans) with tens of thousands of nodes and hundreds of binary prediction tasks demonstrate the efficacy of our method.
Collapse
Affiliation(s)
- Shankar Vembu
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.
| | | |
Collapse
|
16
|
ŽITNIK MARINKA, ZUPAN BLAŽ. Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold. Pac Symp Biocomput 2014:400-411. [PMID: 24297565 PMCID: PMC3902649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.
Collapse
Affiliation(s)
- MARINKA ŽITNIK
- Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, SI-1000, Slovenia,
| | - BLAŽ ZUPAN
- Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, SI-1000, Slovenia; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX-77030, USA,
| |
Collapse
|
17
|
FUNK CHRISTOPHERS, HUNTER LAWRENCEE, COHEN KBRETONNEL. Combining heterogenous data for prediction of disease related and pharmacogenes. Pac Symp Biocomput 2014:328-39. [PMID: 24297559 PMCID: PMC3910248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Identifying genetic variants that affect drug response or play a role in disease is an important task for clinicians and researchers. Before individual variants can be explored efficiently for effect on drug response or disease relationships, specific candidate genes must be identified. While many methods rank candidate genes through the use of sequence features and network topology, only a few exploit the information contained in the biomedical literature. In this work, we train and test a classifier on known pharmacogenes from PharmGKB and present a classifier that predicts pharmacogenes on a genome-wide scale using only Gene Ontology annotations and simple features mined from the biomedical literature. Performance of F=0.86, AUC=0.860 is achieved. The top 10 predicted genes are analyzed. Additionally, a set of enriched pharmacogenic Gene Ontology concepts is produced.
Collapse
Affiliation(s)
- CHRISTOPHER S. FUNK
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - LAWRENCE E. HUNTER
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - K. BRETONNEL COHEN
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
18
|
Clark WT, Radivojac P. Vector quantization kernels for the classification of protein sequences and structures. Pac Symp Biocomput 2014:316-327. [PMID: 24297558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We propose a new kernel-based method for the classification of protein sequences and structures. We first represent each protein as a set of time series data using several structural, physicochemical, and predicted properties such as a sequence of consecutive dihedral angles, hydrophobicity indices, or predictions of disordered regions. A kernel function is then computed for pairs of proteins, exploiting the principles of vector quantization and subsequently used with support vector machines for protein classification. Although our method requires a significant pre-processing step, it is fast in the training and prediction stages owing to the linear complexity of kernel computation with the length of protein sequences. We evaluate our approach on two protein classification tasks involving the prediction of SCOP structural classes and catalytic activity according to the Gene Ontology. We provide evidence that the method is competitive when compared to string kernels, and useful for a range of protein classification tasks. Furthermore, the applicability of our approach extends beyond computational biology to any classification of time series data.
Collapse
Affiliation(s)
- Wyatt T Clark
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana 47405, U.S.A
| | | |
Collapse
|