1
|
Human Genetics and Genomics for Drug Target Identification and Prioritization: Open Targets' Perspective. Annu Rev Biomed Data Sci 2024. [PMID: 38608311 DOI: 10.1146/annurev-biodatasci-102523-103838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
Open Targets, a consortium among academic and industry partners, focuses on using human genetics and genomics to provide insights to key questions that build therapeutic hypotheses. Large-scale experiments generate foundational data, and open-source informatic platforms systematically integrate evidence for target-disease relationships and provide dynamic tooling for target prioritization. A locus-to-gene machine learning model uses evidence from genome-wide association studies (GWAS Catalog, UK BioBank, and FinnGen), functional genomic studies, epigenetic studies, and variant effect prediction to predict potential drug targets for complex diseases. These predictions are combined with genetic evidence from gene burden analyses, rare disease genetics, somatic mutations, perturbation assays, pathway analyses, scientific literature, differential expression, and mouse models to systematically build target-disease associations (https://platform.opentargets.org). Scored target attributes such as clinical precedence, tractability, and safety guide target prioritization. Here we provide our perspective on the value and impact of human genetics and genomics for generating therapeutic hypotheses.
Collapse
|
2
|
FORGEdb: a tool for identifying candidate functional variants and uncovering target genes and mechanisms for complex diseases. Genome Biol 2024; 25:3. [PMID: 38167104 PMCID: PMC10763681 DOI: 10.1186/s13059-023-03126-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024] Open
Abstract
The majority of disease-associated variants identified through genome-wide association studies are located outside of protein-coding regions. Prioritizing candidate regulatory variants and gene targets to identify potential biological mechanisms for further functional experiments can be challenging. To address this challenge, we developed FORGEdb ( https://forgedb.cancer.gov/ ; https://forge2.altiusinstitute.org/files/forgedb.html ; and https://doi.org/10.5281/zenodo.10067458 ), a standalone and web-based tool that integrates multiple datasets, delivering information on associated regulatory elements, transcription factor binding sites, and target genes for over 37 million variants. FORGEdb scores provide researchers with a quantitative assessment of the relative importance of each variant for targeted functional experiments.
Collapse
|
3
|
|
4
|
Integrative GWAS and co-localisation analysis suggests novel genes associated with age-related multimorbidity. Sci Data 2023; 10:655. [PMID: 37749083 PMCID: PMC10520009 DOI: 10.1038/s41597-023-02513-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Advancing age is the greatest risk factor for developing multiple age-related diseases. Therapeutic approaches targeting the underlying pathways of ageing, rather than individual diseases, may be an effective way to treat and prevent age-related morbidity while reducing the burden of polypharmacy. We harness the Open Targets Genetics Portal to perform a systematic analysis of nearly 1,400 genome-wide association studies (GWAS) mapped to 34 age-related diseases and traits, identifying genetic signals that are shared between two or more of these traits. Using locus-to-gene (L2G) mapping, we identify 995 targets with shared genetic links to age-related diseases and traits, which are enriched in mechanisms of ageing and include known ageing and longevity-related genes. Of these 995 genes, 128 are the target of an approved or investigational drug, 526 have experimental evidence of binding pockets or are predicted to be tractable, and 341 have no existing tractability evidence, representing underexplored genes which may reveal novel biological insights and therapeutic opportunities. We present these candidate targets for exploration and prioritisation in a web application.
Collapse
|
5
|
Developing a cluster-based approach for deciphering complexity in individuals with neurodevelopmental differences. Front Pediatr 2023; 11:1171920. [PMID: 37790694 PMCID: PMC10543689 DOI: 10.3389/fped.2023.1171920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/01/2023] [Indexed: 10/05/2023] Open
Abstract
Objective Individuals with neurodevelopmental disorders such as global developmental delay (GDD) present both genotypic and phenotypic heterogeneity. This diversity has hampered developing of targeted interventions given the relative rarity of each individual genetic etiology. Novel approaches to clinical trials where distinct, but related diseases can be treated by a common drug, known as basket trials, which have shown benefits in oncology but have yet to be used in GDD. Nonetheless, it remains unclear how individuals with GDD could be clustered. Here, we assess two different approaches: agglomerative and divisive clustering. Methods Using the largest cohort of individuals with GDD, which is the Deciphering Developmental Disorders (DDD), characterized using a systematic approach, we extracted genotypic and phenotypic information from 6,588 individuals with GDD. We then used a k-means clustering (divisive) and hierarchical agglomerative clustering (HAC) to identify subgroups of individuals. Next, we extracted gene network and molecular function information with regard to the clusters identified by each approach. Results HAC based on phenotypes identified in individuals with GDD revealed 16 clusters, each presenting with one dominant phenotype displayed by most individuals in the cluster, along with other minor phenotypes. Among the most common phenotypes reported were delayed speech, absent speech, and seizure. Interestingly, each phenotypic cluster molecularly included several (3-12) gene sub-networks of more closely related genes with diverse molecular function. k-means clustering also segregated individuals harboring those phenotypes, but the genetic pathways identified were different from the ones identified from HAC. Conclusion Our study illustrates how divisive (k-means) and agglomerative clustering can be used in order to group individuals with GDD for future basket trials. Moreover, the result of our analysis suggests that phenotypic clusters should be subdivided into molecular sub-networks for an increased likelihood of successful treatment. Finally, a combination of both agglomerative and divisive clustering may be required for developing of a comprehensive treatment.
Collapse
|
6
|
Sex difference contributes to phenotypic diversity in individuals with neurodevelopmental disorders. Front Pediatr 2023; 11:1172154. [PMID: 37609366 PMCID: PMC10441218 DOI: 10.3389/fped.2023.1172154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 07/20/2023] [Indexed: 08/24/2023] Open
Abstract
Objective Gain a better understanding of sex-specific differences in individuals with global developmental delay (GDD), with a focus on phenotypes and genotypes. Methods Using the Deciphering Developmental Disorders (DDD) dataset, we extracted phenotypic information from 6,588 individuals with GDD and then identified statistically significant variations in phenotypes and genotypes based on sex. We compared genes with pathogenic variants between sex and then performed gene network and molecular function enrichment analysis and gene expression profiling between sex. Finally, we contrasted individuals with autism as an associated condition. Results We identified significantly differentially expressed phenotypes in males vs. females individuals with GDD. Autism and macrocephaly were significantly more common in males whereas microcephaly and stereotypies were more common in females. Importantly, 66% of GDD genes with pathogenic variants overlapped between both sexes. In the cohort, males presented with only slightly increased X-linked genes (9% vs. 8%, respectively). Individuals from both sexes harbored a similar number of pathogenic variants overall (3) but females presented with a significantly higher load for GDD genes with high intolerance to loss of function. Sex difference in gene expression correlated with genes identified in a sex specific manner. While we identified sex-specific GDD gene mutations, their pathways overlapped. Interestingly, individuals with GDD but also co-morbid autism phenotypes, we observed distinct mutation load, pathways and phenotypic presentation. Conclusion Our study shows for the first time that males and females with GDD present with significantly different phenotypes. Moreover, while most GDD genes overlapped, some genes were found uniquely in each sex. Surprisingly they shared similar molecular functions. Sorting genes by predicted tolerance to loss of function (pLI) led to identifying an increased mutation load in females with GDD, suggesting potentially a tolerance to GDD genes of higher pLI compared to overall GDD genes. Finally, we show that considering associated conditions (for instance autism) may influence the genomic underpinning found in individuals with GDD and highlight the importance of comprehensive phenotyping.
Collapse
|
7
|
Democratizing knowledge representation with BioCypher. Nat Biotechnol 2023; 41:1056-1059. [PMID: 37337100 DOI: 10.1038/s41587-023-01848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
|
8
|
Future prospects for human genetics and genomics in drug discovery. Curr Opin Struct Biol 2023; 80:102568. [PMID: 36963162 PMCID: PMC7614359 DOI: 10.1016/j.sbi.2023.102568] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 03/26/2023]
Abstract
Evidence from human genetics supporting the therapeutic hypothesis increases the likelihood that a drug will succeed in clinical trials. Rare and common disease genetics yield a wide array of alleles with a range of effect sizes that can proxy for the effect of a drug in disease. Recent advances in large scale population collections and whole genome sequencing approaches have provided a rich resource of human genetic evidence to support drug target selection. As the range of phenotypes profiled increases and ever more alleles are discovered across world-wide populations, these approaches will increasingly influence multiple stages across the lifespan of a drug discovery programme.
Collapse
|
9
|
Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat Genet 2023; 55:389-398. [PMID: 36823319 PMCID: PMC10011132 DOI: 10.1038/s41588-023-01327-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 01/30/2023] [Indexed: 02/25/2023]
Abstract
Interacting proteins tend to have similar functions, influencing the same organismal traits. Interaction networks can be used to expand the list of candidate trait-associated genes from genome-wide association studies. Here, we performed network-based expansion of trait-associated genes for 1,002 human traits showing that this recovers known disease genes or drug targets. The similarity of network expansion scores identifies groups of traits likely to share an underlying genetic and biological process. We identified 73 pleiotropic gene modules linked to multiple traits, enriched in genes involved in processes such as protein ubiquitination and RNA processing. In contrast to gene deletion studies, pleiotropy as defined here captures specifically multicellular-related processes. We show examples of modules linked to human diseases enriched in genes with known pathogenic variants that can be used to map targets of approved drugs for repurposing. Finally, we illustrate the use of network expansion scores to study genes at inflammatory bowel disease genome-wide association study loci, and implicate inflammatory bowel disease-relevant genes with strong functional and genetic support.
Collapse
|
10
|
The next-generation Open Targets Platform: reimagined, redesigned, rebuilt. Nucleic Acids Res 2023; 51:D1353-D1359. [PMID: 36399499 PMCID: PMC9825572 DOI: 10.1093/nar/gkac1046] [Citation(s) in RCA: 57] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/14/2022] [Accepted: 10/27/2022] [Indexed: 11/19/2022] Open
Abstract
The Open Targets Platform (https://platform.opentargets.org/) is an open source resource to systematically assist drug target identification and prioritisation using publicly available data. Since our last update, we have reimagined, redesigned, and rebuilt the Platform in order to streamline data integration and harmonisation, expand the ways in which users can explore the data, and improve the user experience. The gene-disease causal evidence has been enhanced and expanded to better capture disease causality across rare, common, and somatic diseases. For target and drug annotations, we have incorporated new features that help assess target safety and tractability, including genetic constraint, PROTACtability assessments, and AlphaFold structure predictions. We have also introduced new machine learning applications for knowledge extraction from the published literature, clinical trial information, and drug labels. The new technologies and frameworks introduced since the last update will ease the introduction of new features and the creation of separate instances of the Platform adapted to user requirements. Our new Community forum, expanded training materials, and outreach programme support our users in a range of use cases.
Collapse
|
11
|
|
12
|
Multi-ancestry Mendelian randomization of omics traits revealing drug targets of COVID-19 severity. EBioMedicine 2022; 81:104112. [PMID: 35772218 PMCID: PMC9235320 DOI: 10.1016/j.ebiom.2022.104112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/16/2022] [Accepted: 05/28/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Recent omic studies prioritised several drug targets associated with coronavirus disease 2019 (COVID-19) severity. However, little evidence was provided to systematically estimate the effect of drug targets on COVID-19 severity in multiple ancestries. METHODS In this study, we applied Mendelian randomization (MR) and colocalization approaches to understand the putative causal effects of 16,059 transcripts and 1608 proteins on COVID-19 severity in European and effects of 610 proteins on COVID-19 severity in African ancestry. We further integrated genetics, clinical and literature evidence to prioritise drug targets. Additional sensitivity analyses including multi-trait colocalization and phenome-wide MR were conducted to test for MR assumptions. FINDINGS MR and colocalization prioritized four protein targets, FCRL3, ICAM5, ENTPD5 and OAS1 that showed effect on COVID-19 severity in European ancestry. One protein target, SERPINA1 showed a stronger effect in African ancestry but much weaker effect in European ancestry (odds ratio [OR] in Africans=0.369, 95%CI=0.203 to 0.668, P = 9.96 × 10-4; OR in Europeans=1.021, 95%CI=0.901 to 1.157, P = 0.745), which suggested that increased level of SERPINA1 will reduce COVID-19 risk in African ancestry. One protein, ICAM1 showed suggestive effect on COVID-19 severity in both ancestries (OR in Europeans=1.152, 95%CI=1.063 to 1.249, P = 5.94 × 10-4; OR in Africans=1.481, 95%CI=1.008 to 2.176; P = 0.045). The OAS1, SERPINA1 and ICAM1 effects were replicated using updated COVID-19 severity data in the two ancestries respectively, where alternative splicing events in OAS1 and ICAM1 also showed marginal effects on COVID-19 severity in Europeans. The phenome-wide MR of the prioritised targets on 622 complex traits provided information on potential beneficial effects on other diseases and suggested little evidence of adverse effects on major complications. INTERPRETATION Our study identified six proteins as showing putative causal effects on COVID-19 severity. OAS1 and SERPINA1 were targets of existing drugs in trials as potential COVID-19 treatments. ICAM1, ICAM5 and FCRL3 are related to the immune system. Across the six targets, OAS1 has no reliable instrument in African ancestry; SERPINA1, FCRL3, ICAM5 and ENTPD5 showed a different level of putative causal evidence in European and African ancestries, which highlights the importance of more powerful ancestry-specific GWAS and value of multi-ancestry MR in informing the effects of drug targets on COVID-19 across different populations. This study provides a first step towards clinical investigation of beneficial and adverse effects of COVID-19 drug targets. FUNDING No.
Collapse
|
13
|
Whole-exome sequencing identifies rare genetic variants associated with human plasma metabolites. Am J Hum Genet 2022; 109:1038-1054. [PMID: 35568032 PMCID: PMC9247822 DOI: 10.1016/j.ajhg.2022.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 04/13/2022] [Indexed: 12/11/2022] Open
Abstract
Metabolite levels measured in the human population are endophenotypes for biological processes. We combined sequencing data for 3,924 (whole-exome sequencing, WES, discovery) and 2,805 (whole-genome sequencing, WGS, replication) donors from a prospective cohort of blood donors in England. We used multiple approaches to select and aggregate rare genetic variants (minor allele frequency [MAF] < 0.1%) in protein-coding regions and tested their associations with 995 metabolites measured in plasma by using ultra-high-performance liquid chromatography-tandem mass spectrometry. We identified 40 novel associations implicating rare coding variants (27 genes and 38 metabolites), of which 28 (15 genes and 28 metabolites) were replicated. We developed algorithms to prioritize putative driver variants at each locus and used mediation and Mendelian randomization analyses to test directionality at associations of metabolite and protein levels at the ACY1 locus. Overall, 66% of reported associations implicate gene targets of approved drugs or bioactive drug-like compounds, contributing to drug targets' validating efforts.
Collapse
|
14
|
Immune disease variants modulate gene expression in regulatory CD4 + T cells. CELL GENOMICS 2022; 2:None. [PMID: 35591976 PMCID: PMC9010307 DOI: 10.1016/j.xgen.2022.100117] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 11/02/2021] [Accepted: 03/15/2022] [Indexed: 12/30/2022]
Abstract
Identifying cellular functions dysregulated by disease-associated variants could implicate novel pathways for drug targeting or modulation in cell therapies. However, follow-up studies can be challenging if disease-relevant cell types are difficult to sample. Variants associated with immune diseases point toward the role of CD4+ regulatory T cells (Treg cells). We mapped genetic regulation (quantitative trait loci [QTL]) of gene expression and chromatin activity in Treg cells, and we identified 133 colocalizing loci with immune disease variants. Colocalizations of immune disease genome-wide association study (GWAS) variants with expression QTLs (eQTLs) controlling the expression of CD28 and STAT5A, involved in Treg cell activation and interleukin-2 (IL-2) signaling, support the contribution of Treg cells to the pathobiology of immune diseases. Finally, we identified seven known drug targets suitable for drug repurposing and suggested 63 targets with drug tractability evidence among the GWAS signals that colocalized with Treg cell QTLs. Our study is the first in-depth characterization of immune disease variant effects on Treg cell gene expression modulation and dysregulation of Treg cell function.
Collapse
|
15
|
Integrative analysis of 3604 GWAS reveals multiple novel cell type-specific regulatory associations. Genome Biol 2022; 23:13. [PMID: 34996498 PMCID: PMC8742386 DOI: 10.1186/s13059-021-02560-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 11/26/2021] [Indexed: 01/02/2023] Open
Abstract
Background Genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are known to preferentially co-locate to active regulatory elements in tissues and cell types relevant to disease aetiology. Further characterisation of associated cell type-specific regulation can broaden our understanding of how GWAS signals may contribute to disease risk. Results To gain insight into potential functional mechanisms underlying GWAS associations, we developed FORGE2 (https://forge2.altiusinstitute.org/), which is an updated version of the FORGE web tool. FORGE2 uses an expanded atlas of cell type-specific regulatory element annotations, including DNase I hotspots, five histone mark categories and 15 hidden Markov model (HMM) chromatin states, to identify tissue- and cell type-specific signals. An analysis of 3,604 GWAS from the NHGRI-EBI GWAS catalogue yielded at least one significant disease/trait-tissue association for 2,057 GWAS, including > 400 associations specific to epigenomic marks in immune tissues and cell types, > 30 associations specific to heart tissue, and > 60 associations specific to brain tissue, highlighting the key potential of tissue- and cell type-specific regulatory elements. Importantly, we demonstrate that FORGE2 analysis can separate previously observed accessible chromatin enrichments into different chromatin states, such as enhancers or active transcription start sites, providing a greater understanding of underlying regulatory mechanisms. Interestingly, tissue-specific enrichments for repressive chromatin states and histone marks were also detected, suggesting a role for tissue-specific repressed regions in GWAS-mediated disease aetiology. Conclusion In summary, we demonstrate that FORGE2 has the potential to uncover previously unreported disease-tissue associations and identify new candidate mechanisms. FORGE2 is a transparent, user-friendly web tool for the integrative analysis of loci discovered from GWAS. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02560-3.
Collapse
|
16
|
Abstract
Proteolysis-targeting chimeras (PROTACs) are an emerging drug modality that may offer new opportunities to circumvent some of the limitations associated with traditional small-molecule therapeutics. By analogy with the concept of the 'druggable genome', the question arises as to which potential drug targets might PROTAC-mediated protein degradation be most applicable. Here, we present a systematic approach to the assessment of the PROTAC tractability (PROTACtability) of protein targets using a series of criteria based on data and information from a diverse range of relevant publicly available resources. Our approach could support decision-making on whether or not a particular target may be amenable to modulation using a PROTAC. Using our approach, we identified 1,067 proteins of the human proteome that have not yet been described in the literature as PROTAC targets that offer potential opportunities for future PROTAC-based efforts.
Collapse
|
17
|
A proteome-wide genetic investigation identifies several SARS-CoV-2-exploited host targets of clinical relevance. eLife 2021; 10:e69719. [PMID: 34402426 PMCID: PMC8457835 DOI: 10.7554/elife.69719] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 08/07/2021] [Indexed: 12/16/2022] Open
Abstract
Background The virus SARS-CoV-2 can exploit biological vulnerabilities (e.g. host proteins) in susceptible hosts that predispose to the development of severe COVID-19. Methods To identify host proteins that may contribute to the risk of severe COVID-19, we undertook proteome-wide genetic colocalisation tests, and polygenic (pan) and cis-Mendelian randomisation analyses leveraging publicly available protein and COVID-19 datasets. Results Our analytic approach identified several known targets (e.g. ABO, OAS1), but also nominated new proteins such as soluble Fas (colocalisation probability >0.9, p=1 × 10-4), implicating Fas-mediated apoptosis as a potential target for COVID-19 risk. The polygenic (pan) and cis-Mendelian randomisation analyses showed consistent associations of genetically predicted ABO protein with several COVID-19 phenotypes. The ABO signal is highly pleiotropic, and a look-up of proteins associated with the ABO signal revealed that the strongest association was with soluble CD209. We demonstrated experimentally that CD209 directly interacts with the spike protein of SARS-CoV-2, suggesting a mechanism that could explain the ABO association with COVID-19. Conclusions Our work provides a prioritised list of host targets potentially exploited by SARS-CoV-2 and is a precursor for further research on CD209 and FAS as therapeutically tractable targets for COVID-19. Funding MAK, JSc, JH, AB, DO, MC, EMM, MG, ID were funded by Open Targets. J.Z. and T.R.G were funded by the UK Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). JSh and GJW were funded by the Wellcome Trust Grant 206194. This research was funded in part by the Wellcome Trust [Grant 206194]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Collapse
|
18
|
Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res 2021; 49:D1302-D1310. [PMID: 33196847 PMCID: PMC7779013 DOI: 10.1093/nar/gkaa1027] [Citation(s) in RCA: 192] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/14/2020] [Accepted: 11/11/2020] [Indexed: 02/07/2023] Open
Abstract
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.
Collapse
|
19
|
Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 2021; 49:D1311-D1320. [PMID: 33045747 PMCID: PMC7778936 DOI: 10.1093/nar/gkaa840] [Citation(s) in RCA: 210] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/16/2020] [Accepted: 09/17/2020] [Indexed: 01/22/2023] Open
Abstract
Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
Collapse
|
20
|
|
21
|
Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science 2020; 370:eabe9403. [PMID: 33060197 PMCID: PMC7808408 DOI: 10.1126/science.abe9403] [Citation(s) in RCA: 427] [Impact Index Per Article: 106.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 10/12/2020] [Indexed: 01/18/2023]
Abstract
The COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a grave threat to public health and the global economy. SARS-CoV-2 is closely related to the more lethal but less transmissible coronaviruses SARS-CoV-1 and Middle East respiratory syndrome coronavirus (MERS-CoV). Here, we have carried out comparative viral-human protein-protein interaction and viral protein localization analyses for all three viruses. Subsequent functional genetic screening identified host factors that functionally impinge on coronavirus proliferation, including Tom70, a mitochondrial chaperone protein that interacts with both SARS-CoV-1 and SARS-CoV-2 ORF9b, an interaction we structurally characterized using cryo-electron microscopy. Combining genetically validated host factors with both COVID-19 patient genetic data and medical billing records identified molecular mechanisms and potential drug treatments that merit further molecular and clinical study.
Collapse
|
22
|
The open targets post-GWAS analysis pipeline. Bioinformatics 2020; 36:2936-2937. [PMID: 31930349 PMCID: PMC7203748 DOI: 10.1093/bioinformatics/btaa020] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/19/2019] [Accepted: 01/09/2020] [Indexed: 11/17/2022] Open
Abstract
Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.
Collapse
|
23
|
eFORGE v2.0: updated analysis of cell type-specific signal in epigenomic data. Bioinformatics 2020; 35:4767-4769. [PMID: 31161210 PMCID: PMC6853678 DOI: 10.1093/bioinformatics/btz456] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Revised: 04/24/2019] [Accepted: 05/29/2019] [Indexed: 12/31/2022] Open
Abstract
SUMMARY The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing. AVAILABILITY AND IMPLEMENTATION eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
24
|
Open Targets Platform: new developments and updates two years on. Nucleic Acids Res 2020; 47:D1056-D1065. [PMID: 30462303 PMCID: PMC6324073 DOI: 10.1093/nar/gky1133] [Citation(s) in RCA: 269] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/26/2018] [Indexed: 12/22/2022] Open
Abstract
The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The associations are displayed in an intuitive user interface (https://www.targetvalidation.org), and are available through a REST-API (https://api.opentargets.io/v3/platform/docs/swagger-ui) and a bulk download (https://www.targetvalidation.org/downloads/data). In addition to target-disease associations, we also aggregate and display data at the target and disease levels to aid target prioritisation. Since our first publication two years ago, we have made eight releases, added new data sources for target-disease associations, started including causal genetic variants from non genome-wide targeted arrays, added new target and disease annotations, launched new visualisations and improved existing ones and released a new web tool for batch search of up to 200 targets. We have a new URL for the Open Targets Platform REST-API, new REST endpoints and also removed the need for authorisation for API fair use. Here, we present the latest developments of the Open Targets Platform, expanding the evidence and target-disease associations with new and improved data sources, refining data quality, enhancing website usability, and increasing our user base with our training workshops, user support, social media and bioinformatics forum engagement.
Collapse
|
25
|
Abstract
Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
Collapse
|
26
|
Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia. Am J Hum Genet 2019; 104:948-956. [PMID: 30982612 DOI: 10.1016/j.ajhg.2019.03.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 03/04/2019] [Indexed: 12/11/2022] Open
Abstract
The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Cav2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca2+ influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Cav2.2 in normal human neurodevelopment.
Collapse
|
27
|
GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat Genet 2019; 51:343-353. [PMID: 30692680 PMCID: PMC6908448 DOI: 10.1038/s41588-018-0322-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 11/29/2018] [Indexed: 12/31/2022]
Abstract
Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.
Collapse
|
28
|
Abstract
The Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) is an expert-curated description of the genes driving human cancer that is used as a standard in cancer genetics across basic research, medical reporting and pharmaceutical development. After a major expansion and complete re-evaluation, the 2018 CGC describes in detail the effect of 719 cancer-driving genes. The recent expansion includes functional and mechanistic descriptions of how each gene contributes to disease generation in terms of the key cancer hallmarks and the impact of mutations on gene and protein function. These functional characteristics depict the extraordinary complexity of cancer biology and suggest multiple cancer-related functions for many genes, which are often highly tissue-dependent or tumour stage-dependent. The 2018 CGC encompasses a second tier, describing an expanding list of genes (currently 145) from more recent cancer studies that show supportive but less detailed indications of a role in cancer.
Collapse
|
29
|
Ten simple rules for delivering live distance training in bioinformatics across the globe using webinars. PLoS Comput Biol 2018; 14:e1006419. [PMID: 30439935 PMCID: PMC6237289 DOI: 10.1371/journal.pcbi.1006419] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
30
|
Uncovering new disease indications for G-protein coupled receptors and their endogenous ligands. BMC Bioinformatics 2018; 19:345. [PMID: 30285606 PMCID: PMC6167889 DOI: 10.1186/s12859-018-2392-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 09/23/2018] [Indexed: 11/29/2022] Open
Abstract
Background The Open Targets Platform integrates different data sources in order to facilitate identification of potential therapeutic drug targets to treat human diseases. It currently provides evidence for nearly 2.6 million potential target-disease pairs. G-protein coupled receptors are a drug target class of high interest because of the number of successful drugs being developed against them over many years. Here we describe a systematic approach utilizing the Open Targets Platform data to uncover and prioritize potential new disease indications for the G-protein coupled receptors and their ligands. Results Utilizing the data available in the Open Targets platform, potential G-protein coupled receptor and endogenous ligand disease association pairs were systematically identified. Intriguing examples such as GPR35 for inflammatory bowel disease and CXCR4 for viral infection are used as illustrations of how a systematic approach can aid in the prioritization of interesting drug discovery hypotheses. Combining evidences for G-protein coupled receptors and their corresponding endogenous peptidergic ligands increases confidence and provides supportive evidence for potential new target-disease hypotheses. Comparing such hypotheses to the global pharma drug discovery pipeline to validate the approach showed that more than 93% of G-protein coupled receptor-disease pairs with a high overall Open Targets score involved receptors with an existing drug discovery program. Conclusions The Open Targets gene-disease score can be used to prioritize potential G-protein coupled receptors-indication hypotheses. In addition, availability of multiple different evidence types markedly increases confidence as does combining evidence from known receptor-ligand pairs. Comparing the top-ranked hypotheses to the current global pharma pipeline serves validation of our approach and identifies and prioritizes new therapeutic opportunities. Electronic supplementary material The online version of this article (10.1186/s12859-018-2392-y) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Abstract
Determining the functions of human genes is a key objective for understanding disease and enabling development of new therapeutic approaches. A number of recent studies have shown that the amount of attention the research community gives to each of the more than 20,000 human genes is dramatically skewed toward specific, well-known genes. In this issue, Stoeger and colleagues uncover the factors that explain this bias and offer a way ahead to move more genes into the research limelight. Some genes get all the luck. This Primer explores a new analysis as to why the amount of research attention given to each of the more than 20,000 human genes is dramatically skewed towards specific, well known genes, and asks whether we need to take steps to change it.
Collapse
|
32
|
Transcription Factor Activities Enhance Markers of Drug Sensitivity in Cancer. Cancer Res 2017; 78:769-780. [PMID: 29229604 DOI: 10.1158/0008-5472.can-17-1679] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 10/16/2017] [Accepted: 12/04/2017] [Indexed: 12/12/2022]
Abstract
Transcriptional dysregulation induced by aberrant transcription factors (TF) is a key feature of cancer, but its global influence on drug sensitivity has not been examined. Here, we infer the transcriptional activity of 127 TFs through analysis of RNA-seq gene expression data newly generated for 448 cancer cell lines, combined with publicly available datasets to survey a total of 1,056 cancer cell lines and 9,250 primary tumors. Predicted TF activities are supported by their agreement with independent shRNA essentiality profiles and homozygous gene deletions, and recapitulate mutant-specific mechanisms of transcriptional dysregulation in cancer. By analyzing cell line responses to 265 compounds, we uncovered numerous TFs whose activity interacts with anticancer drugs. Importantly, combining existing pharmacogenomic markers with TF activities often improves the stratification of cell lines in response to drug treatment. Our results, which can be queried freely at dorothea.opentargets.io, offer a broad foundation for discovering opportunities to refine personalized cancer therapies.Significance: Systematic analysis of transcriptional dysregulation in cancer cell lines and patient tumor specimens offers a publicly searchable foundation to discover new opportunities to refine personalized cancer therapies. Cancer Res; 78(3); 769-80. ©2017 AACR.
Collapse
|
33
|
Abstract
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
Collapse
|
34
|
Abstract
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
Collapse
|
35
|
eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data. Cell Rep 2017; 17:2137-2150. [PMID: 27851974 PMCID: PMC5120369 DOI: 10.1016/j.celrep.2016.10.059] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 08/25/2016] [Accepted: 09/30/2016] [Indexed: 12/14/2022] Open
Abstract
Epigenome-wide association studies (EWAS) provide an alternative approach for studying human disease through consideration of non-genetic variants such as altered DNA methylation. To advance the complex interpretation of EWAS, we developed eFORGE (http://eforge.cs.ucl.ac.uk/), a new standalone and web-based tool for the analysis and interpretation of EWAS data. eFORGE determines the cell type-specific regulatory component of a set of EWAS-identified differentially methylated positions. This is achieved by detecting enrichment of overlap with DNase I hypersensitive sites across 454 samples (tissues, primary cell types, and cell lines) from the ENCODE, Roadmap Epigenomics, and BLUEPRINT projects. Application of eFORGE to 20 publicly available EWAS datasets identified disease-relevant cell types for several common diseases, a stem cell-like signature in cancer, and demonstrated the ability to detect cell-composition effects for EWAS performed on heterogeneous tissues. Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology.
Collapse
|
36
|
Uncovering novel repositioning opportunities using the Open Targets platform. Drug Discov Today 2017; 22:1800-1807. [PMID: 28919242 DOI: 10.1016/j.drudis.2017.09.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 08/14/2017] [Accepted: 09/05/2017] [Indexed: 12/12/2022]
Abstract
The recently developed Open Targets platform consolidates a wide range of comprehensive evidence associating known and potential drug targets with human diseases. We have harnessed the integrated data from this platform for novel drug repositioning opportunities. Our computational workflow systematically mines data from various evidence categories and presents potential repositioning opportunities for drugs that are marketed or being investigated in ongoing human clinical trials, based on evidence strength on target-disease pairing. We classified these novel target-disease opportunities in several ways: (i) number of independent counts of evidence; (ii) broad therapy area of origin; and (iii) repositioning within or across therapy areas. Finally, we elaborate on one example that was identified by this approach.
Collapse
|
37
|
In silico prediction of novel therapeutic targets using gene-disease association data. J Transl Med 2017; 15:182. [PMID: 28851378 PMCID: PMC5576250 DOI: 10.1186/s12967-017-1285-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 08/22/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
Collapse
|
38
|
Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res 2016; 45:D985-D994. [PMID: 27899665 PMCID: PMC5210543 DOI: 10.1093/nar/gkw1055] [Citation(s) in RCA: 270] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Revised: 10/19/2016] [Accepted: 11/03/2016] [Indexed: 01/16/2023] Open
Abstract
We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.
Collapse
|
39
|
Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation. J Biomed Semantics 2016; 7:8. [PMID: 27011785 PMCID: PMC4804633 DOI: 10.1186/s13326-016-0051-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/02/2016] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts. METHODS Semantic mapping uses a combination of custom scripting, our annotation tool 'Zooma', and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV. RESULTS EFO yields an average of over 80% of mapping coverage in all data sources. A 42% precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study. CONCLUSIONS Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, 'OBAN', as a means to integrate disease using shared phenotypes. AVAILABILITY EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.
Collapse
|
40
|
Ensembl regulation resources. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:bav119. [PMID: 26888907 PMCID: PMC4756621 DOI: 10.1093/database/bav119] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 11/24/2015] [Indexed: 12/11/2022]
Abstract
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org
Collapse
|
41
|
Abstract
Jeffrey Barrett, Ian Dunham and Ewan Birney discuss the initiatives of the newly founded Centre for Therapeutic Target Validation, including a range of approaches to use human genetics to inform drug discovery and make better medicines.
Collapse
|
42
|
Abstract
The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don’t foresee, even now. The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? In this Perspective, eight leaders, spanning a range of related areas, give us their predictions.
Collapse
|
43
|
Correction: Quantitative Genetics of CTCF Binding Reveal Local Sequence Effects and Different Modes of X-Chromosome Association. PLoS Genet 2015; 11:e1005177. [PMID: 25919664 PMCID: PMC4412500 DOI: 10.1371/journal.pgen.1005177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
44
|
Abstract
Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult. Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.
Collapse
|
45
|
Genome-wide meta-analysis identifies new susceptibility loci for migraine. Nat Genet 2013; 45:912-917. [PMID: 23793025 PMCID: PMC4041123 DOI: 10.1038/ng.2676] [Citation(s) in RCA: 276] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 05/30/2013] [Indexed: 12/15/2022]
Abstract
Migraine is the most common brain disorder, affecting approximately 14% of the adult population, but its molecular mechanisms are poorly understood. We report the results of a meta-analysis across 29 genome-wide association studies, including a total of 23,285 individuals with migraine (cases) and 95,425 population-matched controls. We identified 12 loci associated with migraine susceptibility (P<5×10(-8)). Five loci are new: near AJAP1 at 1p36, near TSPAN2 at 1p13, within FHL5 at 6q16, within C7orf10 at 7p14 and near MMP16 at 8q21. Three of these loci were identified in disease subgroup analyses. Brain tissue expression quantitative trait locus analysis suggests potential functional candidate genes at four loci: APOA1BP, TBC1D7, FUT9, STAT6 and ATP5B.
Collapse
|
46
|
High-resolution analysis of cis-acting regulatory networks at the α-globin locus. Philos Trans R Soc Lond B Biol Sci 2013; 368:20120361. [PMID: 23650635 DOI: 10.1098/rstb.2012.0361] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.
Collapse
|
47
|
Abstract
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
Collapse
|
48
|
Abstract
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
Collapse
|
49
|
MiR-25 regulates Wwp2 and Fbxw7 and promotes reprogramming of mouse fibroblast cells to iPSCs. PLoS One 2012; 7:e40938. [PMID: 22912667 PMCID: PMC3422229 DOI: 10.1371/journal.pone.0040938] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 06/14/2012] [Indexed: 11/28/2022] Open
Abstract
Background miRNAs are a class of small non-coding RNAs that regulate gene expression and have critical functions in various biological processes. Hundreds of miRNAs have been identified in mammalian genomes but only a small number of them have been functionally characterized. Recent studies also demonstrate that some miRNAs have important roles in reprogramming somatic cells to induced pluripotent stem cells (iPSCs). Methods We screened 52 miRNAs cloned in a piggybac (PB) vector for their roles in reprogramming of mouse embryonic fibroblast cells to iPSCs. To identify targets of miRNAs, we made Dgcr8-deficient embryonic stem (ES) cells and introduced miRNA mimics to these cells, which lack miRNA biogenesis. The direct target genes of miRNA were identified through global gene expression analysis and target validation. Results and conclusion We found that over-expressing miR-25 or introducing miR-25 mimics enhanced production of iPSCs. We identified a number of miR-25 candidate gene targets. Of particular interest were two ubiquitin ligases, Wwp2 and Fbxw7, which have been proposed to regulate Oct4, c-Myc and Klf5, respectively. Our findings thus highlight the complex interplay between miRNAs and transcription factors involved in reprogramming, stem cell self-renewal and maintenance of pluripotency.
Collapse
|
50
|
Abstract
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
Collapse
|