1
|
Sun N, Akay LA, Murdock MH, Park Y, Galiana-Melendez F, Bubnys A, Galani K, Mathys H, Jiang X, Ng AP, Bennett DA, Tsai LH, Kellis M. Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer's disease. Nat Neurosci 2023; 26:970-982. [PMID: 37264161 PMCID: PMC10464935 DOI: 10.1038/s41593-023-01334-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/17/2023] [Indexed: 06/03/2023]
Abstract
Cerebrovascular dysregulation is a hallmark of Alzheimer's disease (AD), but the changes that occur in specific cell types have not been fully characterized. Here, we profile single-nucleus transcriptomes in the human cerebrovasculature in six brain regions from 220 individuals with AD and 208 age-matched controls. We annotate 22,514 cerebrovascular cells, including 11 subtypes of endothelial, pericyte, smooth muscle, perivascular fibroblast and ependymal cells. We identify 2,676 differentially expressed genes in AD, including downregulation of PDGFRB in pericytes, and of ABCB1 and ATP10A in endothelial cells, and validate the downregulation of SLC6A1 and upregulation of APOD, INSR and COL4A1 in postmortem AD brain tissues. We detect vasculature, glial and neuronal coexpressed gene modules, suggesting coordinated neurovascular unit dysregulation in AD. Integration with AD genetics reveals 125 AD differentially expressed genes directly linked to AD-associated genetic variants. Lastly, we show that APOE4 genotype-associated differences are significantly enriched among AD-associated genes in capillary and venule endothelial cells, as well as subsets of pericytes and fibroblasts.
Collapse
Affiliation(s)
- Na Sun
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leyla Anne Akay
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mitchell H Murdock
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yongjin Park
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology and Laboratory Medicine, Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Molecular Oncology, BC Cancer, Vancouver, British Columbia, Canada
| | - Fabiola Galiana-Melendez
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Adele Bubnys
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kyriaki Galani
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hansruedi Mathys
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Xueqiao Jiang
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ayesha P Ng
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Li-Huei Tsai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
2
|
Bernasconi A, Canakoglu A, Comolli F. Processing genome-wide association studies within a repository of heterogeneous genomic datasets. BMC Genom Data 2023; 24:13. [PMID: 36869294 PMCID: PMC9985298 DOI: 10.1186/s12863-023-01111-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 02/02/2023] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants - typically single-nucleotide polymorphisms (SNPs) - in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. RESULTS To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. CONCLUSIONS As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows.
Collapse
Affiliation(s)
- Anna Bernasconi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| | - Arif Canakoglu
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| | - Federico Comolli
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy
| |
Collapse
|
3
|
Xu J, Mao C, Hou Y, Luo Y, Binder JL, Zhou Y, Bekris LM, Shin J, Hu M, Wang F, Eng C, Oprea TI, Flanagan ME, Pieper AA, Cummings J, Leverenz JB, Cheng F. Interpretable deep learning translation of GWAS and multi-omics findings to identify pathobiology and drug repurposing in Alzheimer's disease. Cell Rep 2022; 41:111717. [PMID: 36450252 PMCID: PMC9837836 DOI: 10.1016/j.celrep.2022.111717] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 09/01/2022] [Accepted: 11/02/2022] [Indexed: 12/03/2022] Open
Abstract
Translating human genetic findings (genome-wide association studies [GWAS]) to pathobiology and therapeutic discovery remains a major challenge for Alzheimer's disease (AD). We present a network topology-based deep learning framework to identify disease-associated genes (NETTAG). We leverage non-coding GWAS loci effects on quantitative trait loci, enhancers and CpG islands, promoter regions, open chromatin, and promoter flanking regions under the protein-protein interactome. Via NETTAG, we identified 156 AD-risk genes enriched in druggable targets. Combining network-based prediction and retrospective case-control observations with 10 million individuals, we identified that usage of four drugs (ibuprofen, gemfibrozil, cholecalciferol, and ceftriaxone) is associated with reduced likelihood of AD incidence. Gemfibrozil (an approved lipid regulator) is significantly associated with 43% reduced risk of AD compared with simvastatin using an active-comparator design (95% confidence interval 0.51-0.63, p < 0.0001). In summary, NETTAG offers a deep learning methodology that utilizes GWAS and multi-genomic findings to identify pathobiology and drug repurposing in AD.
Collapse
Affiliation(s)
- Jielin Xu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Chengsheng Mao
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Yuan Hou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Jessica L Binder
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Lynn M Bekris
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
| | - Jiyoung Shin
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
| | - Tudor I Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA
| | - Margaret E Flanagan
- Department of Pathology and Mesulam Center for Cognitive Neurology and Alzheimer's Disease, Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Andrew A Pieper
- Harrington Discovery Institute, University Hospitals Cleveland Medical Center, Cleveland, OH 44106, USA; Department of Psychiatry, Case Western Reserve University, Cleveland, OH 44106, USA; Geriatric Psychiatry, GRECC, Louis Stokes Cleveland VA Medical Center, Cleveland, OH 44106, USA; Institute for Transformative Molecular Medicine, School of Medicine, Case Western Reserve University, Cleveland 44106, OH, USA; Department of Neuroscience, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA
| | - Jeffrey Cummings
- Chambers-Grundy Center for Transformative Neuroscience, Department of Brain Health, School of Integrated Health Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - James B Leverenz
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Lou Ruvo Center for Brain Health, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
4
|
Lenk HÇ, Klöditz K, Johansson I, Smith RL, Jukić MM, Molden E, Ingelman-Sundberg M. The Polymorphic Nuclear Factor NFIB Regulates Hepatic CYP2D6 Expression and Influences Risperidone Metabolism in Psychiatric Patients. Clin Pharmacol Ther 2022; 111:1165-1174. [PMID: 35253216 PMCID: PMC9314634 DOI: 10.1002/cpt.2571] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 02/21/2022] [Indexed: 12/02/2022]
Abstract
The genetic background for interindividual variability of the polymorphic CYP2D6 enzyme activity remains incompletely understood and the role of NFIB genetic polymorphism for this variability was evaluated in this translational study. We investigated the effect of NFIB expression in vitro using 3D liver spheroids, Huh7 cells, and the influence of the NFIB polymorphism on metabolism of risperidone in patients in vivo. We found that NFIB regulates several important pharmacogenes, including CYP2D6. NFIB inhibited CYP2D6 gene expression in Huh7 cells and NFIB expression in livers was predominantly nuclear and reduced at the mRNA and protein level in carriers of the NFIB rs28379954 T>C allele. Based on 604 risperidone treated patients genotyped for CYP2D6 and NFIB, we found that the rate of risperidone hydroxylation was elevated in NFIB rs28379954 T>C carriers among CYP2D6 normal metabolizers, resulting in a similar rate of drug metabolism to what is observed in CYP2D6 ultrarapid metabolizers, with no such effect observed in CYP2D6 poor metabolizers lacking functional enzyme. The results indicate that NFIB constitutes a novel nuclear factor in the regulation of cytochrome P450 genes, and that its polymorphism is a predictor for the rate of CYP2D6 dependent drug metabolism in vivo.
Collapse
Affiliation(s)
- Hasan Çağın Lenk
- Center for Psychopharmacology, Diakonhjemmet Hospital, Vinderen, Oslo, Norway.,Section for Pharmacology and Pharmaceutical Biosciences, Department of Pharmacy, University of Oslo, Oslo, Norway
| | - Katharina Klöditz
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, Stockholm, Sweden
| | - Inger Johansson
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, Stockholm, Sweden
| | - Robert Løvsletten Smith
- Center for Psychopharmacology, Diakonhjemmet Hospital, Vinderen, Oslo, Norway.,NORMENT, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Marin M Jukić
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, Stockholm, Sweden.,Department of Physiology, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Espen Molden
- Center for Psychopharmacology, Diakonhjemmet Hospital, Vinderen, Oslo, Norway.,Section for Pharmacology and Pharmaceutical Biosciences, Department of Pharmacy, University of Oslo, Oslo, Norway
| | - Magnus Ingelman-Sundberg
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
5
|
Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration. Database (Oxford) 2022; 2022:6554833. [PMID: 35348648 PMCID: PMC9216524 DOI: 10.1093/database/baac019] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 02/14/2022] [Accepted: 03/11/2022] [Indexed: 12/04/2022]
Abstract
The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease–gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease–gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org
Collapse
Affiliation(s)
- Dhouha Grissa
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Alexander Junge
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Tudor I Oprea
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
- Department of Internal Medicine, Division of Translational Informatics, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| |
Collapse
|
6
|
Sun J, Lyu R, Deng L, Li Q, Zhao Y, Zhang Y. SMetABF: A rapid algorithm for Bayesian GWAS meta-analysis with a large number of studies included. PLoS Comput Biol 2022; 18:e1009948. [PMID: 35286307 PMCID: PMC8947622 DOI: 10.1371/journal.pcbi.1009948] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 03/24/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022] Open
Abstract
Bayesian methods are widely used in the GWAS meta-analysis. But the considerable consumption in both computing time and memory space poses great challenges for large-scale meta-analyses. In this research, we propose an algorithm named SMetABF to rapidly obtain the optimal ABF in the GWAS meta-analysis, where shotgun stochastic search (SSS) is introduced to improve the Bayesian GWAS meta-analysis framework, MetABF. Simulation studies confirm that SMetABF performs well in both speed and accuracy, compared to exhaustive methods and MCMC. SMetABF is applied to real GWAS datasets to find several essential loci related to Parkinson's disease (PD) and the results support the underlying relationship between PD and other autoimmune disorders. Developed as an R package and a web tool, SMetABF will become a useful tool to integrate different studies and identify more variants associated with complex traits.
Collapse
Affiliation(s)
- Jianle Sun
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Ruiqi Lyu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Luojia Deng
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qianwen Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhao
- Department of Biostatistics, Nanjing Medical University School of Public Health, Nanjing, Jiangsu, China
- * E-mail: (YAZ); (YUZ)
| | - Yue Zhang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (YAZ); (YUZ)
| |
Collapse
|
7
|
Lin CX, Li HD, Deng C, Liu W, Erhardt S, Wu FX, Zhao XM, Guan Y, Wang J, Wang D, Hu B, Wang J. An integrated brain-specific network identifies genes associated with neuropathologic and clinical traits of Alzheimer’s disease. Brief Bioinform 2021; 23:6483067. [DOI: 10.1093/bib/bbab522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 10/26/2021] [Accepted: 11/13/2021] [Indexed: 11/12/2022] Open
Abstract
Abstract
Alzheimer’s disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer’s brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer’s Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer’s brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Chao Deng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Weisheng Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Shannon Erhardt
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics and Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Bin Hu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| |
Collapse
|
8
|
Yang JJ, Grissa D, Lambert CG, Bologa CG, Mathias SL, Waller A, Wild DJ, Jensen LJ, Oprea TI. TIGA: target illumination GWAS analytics. Bioinformatics 2021; 37:3865-3873. [PMID: 34086846 PMCID: PMC11025677 DOI: 10.1093/bioinformatics/btab427] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 05/12/2021] [Accepted: 06/03/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeremy J Yang
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Integrative Data Science Laboratory, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Dhouha Grissa
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Christophe G Lambert
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Cristian G Bologa
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Stephen L Mathias
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Anna Waller
- Department of Pathology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - David J Wild
- Integrative Data Science Laboratory, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Tudor I Oprea
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| |
Collapse
|
9
|
Lin SH, Brown DW, Machiela MJ. LDtrait: An Online Tool for Identifying Published Phenotype Associations in Linkage Disequilibrium. Cancer Res 2020; 80:3443-3446. [PMID: 32606005 DOI: 10.1158/0008-5472.can-20-0985] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/05/2020] [Accepted: 06/25/2020] [Indexed: 11/16/2022]
Abstract
Genome-wide association studies (GWAS) have identified thousands of germline susceptibility loci associated with risk for cancer as well as a wide range of other traits and diseases. An interest of many investigators is identifying traits or diseases that share common susceptibility loci. We developed LDtrait (https://ldlink.nci.nih.gov/?tab=ldtrait) as an open access web tool for finding germline variation associated with multiple traits. LDtrait searches the NHGRI-EBI GWAS Catalog to identify susceptibility loci in linkage disequilibrium (LD) with a user-provided list of query variants. Options allow for modifying LD thresholds, calculating LD from a diverse set of reference populations, and downloading annotated variant lists. Results from example query searches highlight the utility of LDtrait in uncovering cross-trait associations for cancer risk and other traits. LDtrait accelerates etiologic understanding of cancer genetics by rapidly identifying genetic similarities with other traits or diseases. SIGNIFICANCE: The new GWAS search tool LDtrait will expedite discovery of shared genetic components underlying seemingly unrelated diseases and may offer novel insights into cancer research.
Collapse
Affiliation(s)
- Shu-Hong Lin
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Derek W Brown
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Mitchell J Machiela
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland.
| |
Collapse
|
10
|
Dobriban E. Weighted mining of massive collections of [Formula: see text]-values by convex optimization. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2018; 7:251-275. [PMID: 29930799 PMCID: PMC5998655 DOI: 10.1093/imaiai/iax013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 10/05/2017] [Indexed: 06/08/2023]
Abstract
Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp, a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
Collapse
Affiliation(s)
- Edgar Dobriban
- Department of Statistics, The Wharton School, University of Pennsylania, USA
| |
Collapse
|
11
|
Fortney K, Dobriban E, Garagnani P, Pirazzini C, Monti D, Mari D, Atzmon G, Barzilai N, Franceschi C, Owen AB, Kim SK. Genome-Wide Scan Informed by Age-Related Disease Identifies Loci for Exceptional Human Longevity. PLoS Genet 2015; 11:e1005728. [PMID: 26677855 PMCID: PMC4683064 DOI: 10.1371/journal.pgen.1005728] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Accepted: 11/16/2015] [Indexed: 11/20/2022] Open
Abstract
We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer’s disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer’s disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes. Longevity is a complex phenotype, and few genetic variants that affect lifespan have been identified. However, aging and disease are closely related, and a great deal is known about the genetic basis of disease risk. Here, we show using genome-wide association studies (GWAS) of longevity and disease that there is an overlap between loci involved in longevity and loci involved in several diseases, such as Alzheimer’s disease and coronary artery disease. We then develop a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from 14 large studies of disease and disease-related traits in order to narrow the search for SNPs associated with longevity. Using iGWAS, we found eight SNPs that are significant in our discovery cohorts, and we were able to validate four of these in replication studies of long-lived subjects. Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits. Beyond the study of human longevity, iGWAS can be applied to boost statistical power in any GWAS of a target phenotype by using larger GWAS of genetically-related conditions.
Collapse
Affiliation(s)
- Kristen Fortney
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Edgar Dobriban
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Paolo Garagnani
- Department of Experimental, Diagnostic and Specialty Medicine Experimental Pathology, University of Bologna, Bologna, Italy
- Center for Applied Biomedical Research, St. Orsola-Malpighi University Hospital, Bologna, Italy
| | - Chiara Pirazzini
- Department of Experimental, Diagnostic and Specialty Medicine Experimental Pathology, University of Bologna, Bologna, Italy
- Interdepartmental Centre "L. Galvani" CIG, University of Bologna, Bologna, Italy
| | - Daniela Monti
- Department of Clinical, Experimental and Biomedical Sciences, University of Florence, Florence, Italy
| | - Daniela Mari
- Department of Medical Sciences, University of Milan, Milan, Italy
- Geriatric Unit, IRCCS Ca' Grande Foundation, Maggiore Policlinico Hospital, Milan, Italy
| | - Gil Atzmon
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Nir Barzilai
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Claudio Franceschi
- Department of Experimental, Diagnostic and Specialty Medicine Experimental Pathology, University of Bologna, Bologna, Italy
- IRCCS, Institute of Neurological Sciences of Bologna, Bologna, Italy
| | - Art B. Owen
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Stuart K. Kim
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Dobriban E, Fortney K, Kim SK, Owen AB. Optimal multiple testing under a Gaussian prior on the effect sizes. Biometrika 2015; 102:753-766. [PMID: 27046938 PMCID: PMC4813057 DOI: 10.1093/biomet/asv050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$p$\end{document}-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.
Collapse
Affiliation(s)
- Edgar Dobriban
- Department of Statistics, Stanford University, Stanford, California 94305, U.S.A
| | - Kristen Fortney
- Department of Developmental Biology, Stanford University, Stanford, California 94305, U.S.A.
| | - Stuart K Kim
- Department of Developmental Biology, Stanford University, Stanford, California 94305, U.S.A.
| | - Art B Owen
- Department of Statistics, Stanford University, Stanford, California 94305, U.S.A.
| |
Collapse
|
13
|
Wang Q, Yang C, Gelernter J, Zhao H. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Hum Genet 2015; 134:1195-209. [PMID: 26340901 DOI: 10.1007/s00439-015-1596-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Accepted: 08/23/2015] [Indexed: 02/01/2023]
Abstract
Although some existing epidemiological observations and molecular experiments suggested that brain disorders in the realm of psychiatry may be influenced by immune dysregulation, the degree of genetic overlap between psychiatric disorders and immune disorders has not been well established. We investigated this issue by integrative analysis of genome-wide association studies of 18 complex human traits/diseases (five psychiatric disorders, seven immune disorders, and others) and multiple genome-wide annotation resources (central nervous system genes, immune-related expression-quantitative trait loci (eQTL) and DNase I hypertensive sites from 98 cell lines). We detected pleiotropy in 24 of the 35 psychiatric-immune disorder pairs. The strongest pleiotropy was observed for schizophrenia-rheumatoid arthritis with MHC region included in the analysis (p = 3.9 x 10(-285), and schizophrenia-Crohn's disease with MHC region excluded (p = 1.1 x 10(-36). Significant enrichment (> 1.4 fold) of immune-related eQTL was observed in four psychiatric disorders. Genomic regions responsible for pleiotropy between psychiatric disorders and immune disorders were detected. The MHC region on chromosome 6 appears to be the most important with other regions, such as cytoband 1p13.2, also playing significant roles in pleiotropy. We also found that most alleles shared between schizophrenia and Crohn's disease have the same effect direction, with similar trend found for other disorder pairs, such as bipolar-Crohn's disease. Our results offer a novel bird's-eye view of the genetic relationship and demonstrate strong evidence for pervasive pleiotropy between psychiatric disorders and immune disorders. Our findings might open new routes for prevention and treatment strategies for these disorders based on a new appreciation of the importance of immunological mechanisms in mediating risk of many psychiatric diseases.
Collapse
Affiliation(s)
- Qian Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA CT Healthcare Center, West Haven, CT, USA
| | - Can Yang
- VA CT Healthcare Center, West Haven, CT, USA.,Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.,Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA CT Healthcare Center, West Haven, CT, USA.,Department of Neurobiology, Yale School of Medicine, New Haven, CT, USA.,Department of Genetics, Yale School of Medicine, West Haven, CT, USA
| | - Hongyu Zhao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. .,Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA. .,Department of Genetics, Yale School of Medicine, West Haven, CT, USA. .,VA Cooperative Studies Program Coordinating Center, West Haven, CT, USA.
| |
Collapse
|
14
|
O'Rielly DD, Uddin M, Codner D, Hayley M, Zhou J, Pena-Castillo L, Mostafa AA, Hasan SMM, Liu W, Haroon N, Inman R, Rahman P. Private rare deletions in SEC16A and MAMDC4 may represent novel pathogenic variants in familial axial spondyloarthritis. Ann Rheum Dis 2015; 75:772-9. [PMID: 25956157 PMCID: PMC4819618 DOI: 10.1136/annrheumdis-2014-206484] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 03/07/2015] [Indexed: 01/10/2023]
Abstract
Objective Axial spondyloarthritis (AxSpA) represents a group of inflammatory axial diseases that share common clinical and histopathological manifestations. Ankylosing spondylitis (AS) is the best characterised subset of AxSpA, and its genetic basis has been extensively investigated. Given that genome-wide association studies account for only 25% of AS heritability, the objective of this study was to discover rare, highly penetrant genetic variants in AxSpA pathogenesis using a well-characterised, multigenerational family. Methods HLA-B*27 genotyping and exome sequencing was performed on DNA collected from available family members. Variant frequency was assessed by mining publically available datasets and using fragment analysis of unrelated AxSpA cases and unaffected controls. Gene expression was performed by qPCR, and protein expression was assessed by western blot analysis and immunofluorescence microscopy using patient-derived B-cell lines. Circular dichroism spectroscopy was performed to assess the impact of discovered variants on secondary structure. Results This is the first report identifying two rare private familial variants in a multigenerational AxSpA family, an in-frame SEC16A deletion and an out-of-frame MAMDC4 deletion. Evidence suggests the causative mechanism for SEC16A appears to be a conformational change induced by deletion of three highly conserved amino acids from the intrinsically disordered Sec16A N-terminus and RNA-mediated decay for MAMDC4. Conclusions The results suggest that it is the presence of rare syntenic SEC16A and MAMDC4 deletions that increases susceptibility to AxSpA in family members who carry the HLA-B*27 allele.
Collapse
Affiliation(s)
- Darren D O'Rielly
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Mohammed Uddin
- Program in Genetics and Genome Biology, The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Dianne Codner
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Michael Hayley
- Biochemistry Department, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Jiayi Zhou
- Department of Computer Science, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Lourdes Pena-Castillo
- Department of Computer Science, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Ahmed A Mostafa
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - S M Mahmudul Hasan
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - William Liu
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| | - Nigil Haroon
- Toronto Western Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Robert Inman
- Toronto Western Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Proton Rahman
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada
| |
Collapse
|
15
|
Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of disease-gene associations. Methods 2014; 74:83-9. [PMID: 25484339 DOI: 10.1016/j.ymeth.2014.11.020] [Citation(s) in RCA: 327] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Revised: 11/15/2014] [Accepted: 11/25/2014] [Indexed: 12/18/2022] Open
Abstract
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.
Collapse
Affiliation(s)
- Sune Pletscher-Frankild
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Albert Pallejà
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kalliopi Tsafou
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Janos X Binder
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany; Bioinformatics Core Facility, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
| | - Lars Juhl Jensen
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
16
|
Leslie R, O'Donnell CJ, Johnson AD. GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 2014; 30:i185-94. [PMID: 24931982 DOI: 10.1093/bioinformatics/btu273] [Citation(s) in RCA: 179] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
SUMMARY We created a deeply extracted and annotated database of genome-wide association studies (GWAS) results. GRASP v1.0 contains >6.2 million SNP-phenotype association from among 1390 GWAS studies. We re-annotated GWAS results with 16 annotation sources including some rarely compared to GWAS results (e.g. RNAediting sites, lincRNAs, PTMs). MOTIVATION To create a high-quality resource to facilitate further use and interpretation of human GWAS results in order to address important scientific questions. RESULTS GWAS have grown exponentially, with increases in sample sizes and markers tested, and continuing bias toward European ancestry samples. GRASP contains >100 000 phenotypes, roughly: eQTLs (71.5%), metabolite QTLs (21.2%), methylation QTLs (4.4%) and diseases, biomarkers and other traits (2.8%). cis-eQTLs, meQTLs, mQTLs and MHC region SNPs are highly enriched among significant results. After removing these categories, GRASP still contains a greater proportion of studies and results than comparable GWAS catalogs. Cardiovascular disease and related risk factors pre-dominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR = 0.18[0.16-0.20]) and Y (OR = 0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P ≤ 0.05) levels. We show this gene-length correlation decays at increasingly more stringent P-value thresholds. Potential pleotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. CONCLUSION Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new insights. AVAILABILITY The GRASP database is available at http://apps.nhlbi.nih.gov/grasp.
Collapse
Affiliation(s)
- Richard Leslie
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USACardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Christopher J O'Donnell
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USACardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Andrew D Johnson
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
17
|
Gustafsson M, Edström M, Gawel D, Nestor CE, Wang H, Zhang H, Barrenäs F, Tojo J, Kockum I, Olsson T, Serra-Musach J, Bonifaci N, Pujana MA, Ernerudh J, Benson M. Integrated genomic and prospective clinical studies show the importance of modular pleiotropy for disease susceptibility, diagnosis and treatment. Genome Med 2014; 6:17. [PMID: 24571673 PMCID: PMC4064311 DOI: 10.1186/gm534] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/21/2014] [Indexed: 12/17/2022] Open
Abstract
Background Translational research typically aims to identify and functionally validate individual, disease-specific genes. However, reaching this aim is complicated by the involvement of thousands of genes in common diseases, and that many of those genes are pleiotropic, that is, shared by several diseases. Methods We integrated genomic meta-analyses with prospective clinical studies to systematically investigate the pathogenic, diagnostic and therapeutic roles of pleiotropic genes. In a novel approach, we first used pathway analysis of all published genome-wide association studies (GWAS) to find a cell type common to many diseases. Results The analysis showed over-representation of the T helper cell differentiation pathway, which is expressed in T cells. This led us to focus on expression profiling of CD4+ T cells from highly diverse inflammatory and malignant diseases. We found that pleiotropic genes were highly interconnected and formed a pleiotropic module, which was enriched for inflammatory, metabolic and proliferative pathways. The general relevance of this module was supported by highly significant enrichment of genetic variants identified by all GWAS and cancer studies, as well as known diagnostic and therapeutic targets. Prospective clinical studies of multiple sclerosis and allergy showed the importance of both pleiotropic and disease specific modules for clinical stratification. Conclusions In summary, this translational genomics study identified a pleiotropic module, which has key pathogenic, diagnostic and therapeutic roles.
Collapse
Affiliation(s)
- Mika Gustafsson
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - Måns Edström
- Clinical and Experimental Medicine, Faculty of Health Sciences, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, 58185 Linköping, Sweden
| | - Danuta Gawel
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - Colm E Nestor
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - Hui Wang
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - Huan Zhang
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - Fredrik Barrenäs
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| | - James Tojo
- Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, 17177 Stockholm, Sweden
| | - Ingrid Kockum
- Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, 17177 Stockholm, Sweden
| | - Tomas Olsson
- Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, 17177 Stockholm, Sweden
| | - Jordi Serra-Musach
- Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, 08908 Barcelona, Spain
| | - Núria Bonifaci
- Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, 08908 Barcelona, Spain
| | - Miguel Angel Pujana
- Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, 08908 Barcelona, Spain
| | - Jan Ernerudh
- Clinical and Experimental Medicine, Faculty of Health Sciences, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, 58185 Linköping, Sweden
| | - Mikael Benson
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Linköping University, 58185 Linköping, Sweden
| |
Collapse
|
18
|
Nayak L, Tunga H, De RK. Disease co-morbidity and the human Wnt signaling pathway: a network-wise study. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013; 17:318-37. [PMID: 23692364 DOI: 10.1089/omi.2012.0053] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The human Wnt signaling pathway contains 57 genes communicating among themselves by 70 experimentally established associations, as given in the KEGG/PATHWAY database. It is responsible for a variety of crucial biological functions such as regulation of cell fate determination, proliferation, differentiation, migration, and apoptosis. Abnormal behavior of its members causes numerous types of human cancers, dramatic changes in bone mass density that lead to diseases such as osteoporosis-pseudo-glioma syndrome, Van-Buchem disease, skeletal malformation, autosomal dominant sclerosteosis, and osteoporosis type I syndromes. So far, single genes have been investigated for their disease-causing properties, and single diseases have been traced backwards to discover foul-play of the system pathways. Differential expression of the whole genome has been mapped by microarray. But how all the genes involved in a pathway affect each other in single/multiple disease state(s) and whether the presence of one disease state makes a person prone to another kind of disease(s) (i.e., co-morbidity among diseases associated with a certain important biological pathway) is still unknown. We have developed a human Wnt signaling pathway diseasome and analyzed it for finding answers to such questions. Data used in constructing the diseasome can be downloaded from the publicly accessible webserver http://www.isical.ac.in/-rajat/diseasome/index.php.
Collapse
Affiliation(s)
- Losiana Nayak
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | | | | |
Collapse
|
19
|
Cheng L, Wang G, Li J, Zhang T, Xu P, Wang Y. SIDD: a semantically integrated database towards a global view of human disease. PLoS One 2013; 8:e75504. [PMID: 24146757 PMCID: PMC3795748 DOI: 10.1371/journal.pone.0075504] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2013] [Accepted: 08/15/2013] [Indexed: 01/08/2023] Open
Abstract
Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD.
Collapse
Affiliation(s)
- Liang Cheng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Guohua Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Jie Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianjiao Zhang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Peigang Xu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
- * E-mail:
| |
Collapse
|
20
|
Patnala R, Clements J, Batra J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genet 2013; 14:39. [PMID: 23656885 PMCID: PMC3655892 DOI: 10.1186/1471-2156-14-39] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 04/15/2013] [Indexed: 01/01/2023] Open
Abstract
The candidate gene approach has been a pioneer in the field of genetic epidemiology, identifying risk alleles and their association with clinical traits. With the advent of rapidly changing technology, there has been an explosion of in silico tools available to researchers, giving them fast, efficient resources and reliable strategies important to find casual gene variants for candidate or genome wide association studies (GWAS). In this review, following a description of candidate gene prioritisation, we summarise the approaches to single nucleotide polymorphism (SNP) prioritisation and discuss the tools available to assess functional relevance of the risk variant with consideration to its genomic location. The strategy and the tools discussed are applicable to any study investigating genetic risk factors associated with a particular disease. Some of the tools are also applicable for the functional validation of variants relevant to the era of GWAS and next generation sequencing (NGS).
Collapse
Affiliation(s)
- Radhika Patnala
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD 4059, Australia
| | | | | |
Collapse
|
21
|
Feldmann R, Fischer C, Kodelja V, Behrens S, Haas S, Vingron M, Timmermann B, Geikowski A, Sauer S. Genome-wide analysis of LXRα activation reveals new transcriptional networks in human atherosclerotic foam cells. Nucleic Acids Res 2013; 41:3518-31. [PMID: 23393188 PMCID: PMC3616743 DOI: 10.1093/nar/gkt034] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Increased physiological levels of oxysterols are major risk factors for developing atherosclerosis and cardiovascular disease. Lipid-loaded macrophages, termed foam cells, are important during the early development of atherosclerotic plaques. To pursue the hypothesis that ligand-based modulation of the nuclear receptor LXRα is crucial for cell homeostasis during atherosclerotic processes, we analysed genome-wide the action of LXRα in foam cells and macrophages. By integrating chromatin immunoprecipitation-sequencing (ChIP-seq) and gene expression profile analyses, we generated a highly stringent set of 186 LXRα target genes. Treatment with the nanomolar-binding ligand T0901317 and subsequent auto-regulatory LXRα activation resulted in sequence-dependent sharpening of the genome-binding patterns of LXRα. LXRα-binding loci that correlated with differential gene expression revealed 32 novel target genes with potential beneficial effects, which in part explained the implications of disease-associated genetic variation data. These observations identified highly integrated LXRα ligand-dependent transcriptional networks, including the APOE/C1/C4/C2-gene cluster, which contribute to the reversal of cholesterol efflux and the dampening of inflammation processes in foam cells to prevent atherogenesis.
Collapse
Affiliation(s)
- Radmila Feldmann
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Bianco AM, Marcuzzi A, Zanin V, Girardelli M, Vuch J, Crovella S. Database tools in genetic diseases research. Genomics 2013; 101:75-85. [DOI: 10.1016/j.ygeno.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 10/26/2012] [Accepted: 11/01/2012] [Indexed: 01/22/2023]
|
23
|
Beck T, Free RC, Thorisson GA, Brookes AJ. Semantically enabling a genome-wide association study database. J Biomed Semantics 2012; 3:9. [PMID: 23244533 PMCID: PMC3579732 DOI: 10.1186/2041-1480-3-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 08/22/2012] [Indexed: 01/03/2023] Open
Abstract
Background The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data. Results A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications. Conclusions We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics, University of Leicester, University Road, Leicester, UK.
| | | | | | | |
Collapse
|