1
|
Ding X, Zhu X. Locating potentially lethal genes using the abnormal distributions of genotypes. Sci Rep 2019; 9:10543. [PMID: 31332212 PMCID: PMC6646374 DOI: 10.1038/s41598-019-47076-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 07/10/2019] [Indexed: 11/09/2022] Open
Abstract
Genes are the basic functional units of heredity. Differences in genes can lead to various congenital physical conditions. One kind of these differences is caused by genetic variations named single nucleotide polymorphisms (SNPs). An SNP is a variation in a single nucleotide that occurs at a specific position in the genome. Some SNPs can affect splice sites and protein structures and cause gene abnormalities. SNPs on paired chromosomes may lead to fatal diseases so that a fertilized embryo cannot develop into a normal fetus or the people born with these abnormalities die in childhood. The distributions of genotypes on these SNP sites are different from those on other sites. Based on this idea, we present a novel statistical method to detect the abnormal distributions of genotypes and locate the potentially lethal genes. The test was performed on HapMap data and 74 suspicious SNPs were found. Ten SNP maps “reviewed” genes in the NCBI database. Among them, 5 genes were related to fatal childhood diseases or embryonic development, 1 gene can cause spermatogenic failure, and the other 4 genes were associated with many genetic diseases. The results validated our method. The method is very simple and is guaranteed by a statistical test. It is an inexpensive way to discover potentially lethal genes and the mutation sites. The mined genes deserve further study.
Collapse
Affiliation(s)
- Xiaojun Ding
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000, China.
| | - Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000, China.
| |
Collapse
|
2
|
Schöneberg T, Meister J, Knierim AB, Schulz A. The G protein-coupled receptor GPR34 - The past 20 years of a grownup. Pharmacol Ther 2018; 189:71-88. [PMID: 29684466 DOI: 10.1016/j.pharmthera.2018.04.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Research on GPR34, which was discovered in 1999 as an orphan G protein-coupled receptor of the rhodopsin-like class, disclosed its physiologic relevance only piece by piece. Being present in all recent vertebrate genomes analyzed so far it seems to improve the fitness of species although it is not essential for life and reproduction as GPR34-deficient mice demonstrate. However, closer inspection of macrophages and microglia, where it is mainly expressed, revealed its relevance in immune cell function. Recent data clearly demonstrate that GPR34 function is required to arrest microglia in the M0 homeostatic non-phagocytic phenotype. Herein, we summarize the current knowledge on its evolution, genomic and structural organization, physiology, pharmacology and relevance in human diseases including neurodegenerative diseases and cancer, which accumulated over the last 20 years.
Collapse
Affiliation(s)
- Torsten Schöneberg
- Rudolf Schönheimer Institute of Biochemistry, Molecular Biochemistry, Medical Faculty, University of Leipzig, 04103 Leipzig, Germany.
| | - Jaroslawna Meister
- Molecular Signaling Section, Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, United States
| | - Alexander Bernd Knierim
- Rudolf Schönheimer Institute of Biochemistry, Molecular Biochemistry, Medical Faculty, University of Leipzig, 04103 Leipzig, Germany; Leipzig University Medical Center, IFB AdiposityDiseases, 04103 Leipzig, Germany
| | - Angela Schulz
- Rudolf Schönheimer Institute of Biochemistry, Molecular Biochemistry, Medical Faculty, University of Leipzig, 04103 Leipzig, Germany
| |
Collapse
|
3
|
Bohler A, Wu G, Kutmon M, Pradhana LA, Coort SL, Hanspers K, Haw R, Pico AR, Evelo CT. Reactome from a WikiPathways Perspective. PLoS Comput Biol 2016; 12:e1004941. [PMID: 27203685 PMCID: PMC4874630 DOI: 10.1371/journal.pcbi.1004941] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 04/24/2016] [Indexed: 12/31/2022] Open
Abstract
Reactome and WikiPathways are two of the most popular freely available databases for biological pathways. Reactome pathways are centrally curated with periodic input from selected domain experts. WikiPathways is a community-based platform where pathways are created and continually curated by any interested party. The nascent collaboration between WikiPathways and Reactome illustrates the mutual benefits of combining these two approaches. We created a format converter that converts Reactome pathways to the GPML format used in WikiPathways. In addition, we developed the ComplexViz plugin for PathVisio which simplifies looking up complex components. The plugin can also score the complexes on a pathway based on a user defined criterion. This score can then be visualized on the complex nodes using the visualization options provided by the plugin. Using the merged collection of curated and converted Reactome pathways, we demonstrate improved pathway coverage of relevant biological processes for the analysis of a previously described polycystic ovary syndrome gene expression dataset. Additionally, this conversion allows researchers to visualize their data on Reactome pathways using PathVisio's advanced data visualization functionalities. WikiPathways benefits from the dedicated focus and attention provided to the content converted from Reactome and the wealth of semantic information about interactions. Reactome in turn benefits from the continuous community curation available on WikiPathways. The research community at large benefits from the availability of a larger set of pathways for analysis in PathVisio and Cytoscape. The pathway statistics results obtained from PathVisio are significantly better when using a larger set of candidate pathways for analysis. The conversion serves as a general model for integration of multiple pathway resources developed using different approaches.
Collapse
Affiliation(s)
- Anwesha Bohler
- Department of Bioinformatics—BiGCaT, Maastricht University, Maastricht, The Netherlands
- * E-mail:
| | - Guanming Wu
- Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada
| | - Martina Kutmon
- Department of Bioinformatics—BiGCaT, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Leontius Adhika Pradhana
- Department of Pharmacy, Faculty of Science, National University of Singapore, Singapore, Republic of Singapore
| | - Susan L. Coort
- Department of Bioinformatics—BiGCaT, Maastricht University, Maastricht, The Netherlands
| | - Kristina Hanspers
- Gladstone Institutes, San Francisco, California, United States of America
| | - Robin Haw
- Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada
| | - Alexander R. Pico
- Gladstone Institutes, San Francisco, California, United States of America
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
4
|
Ding X, Wang J, Zelikovsky A, Guo X, Xie M, Pan Y. Searching High-Order SNP Combinations for Complex Diseases Based on Energy Distribution Difference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:695-704. [PMID: 26357280 DOI: 10.1109/tcbb.2014.2363459] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Single nucleotide polymorphisms, a dominant type of genetic variants, have been used successfully to identify defective genes causing human single gene diseases. However, most common human diseases are complex diseases and caused by gene-gene and gene-environment interactions. Many SNP-SNP interaction analysis methods have been introduced but they are not powerful enough to discover interactions more than three SNPs. The paper proposes a novel method that analyzes all SNPs simultaneously. Different from existing methods, the method regards an individual's genotype data on a list of SNPs as a point with a unit of energy in a multi-dimensional space, and tries to find a new coordinate system where the energy distribution difference between cases and controls reaches the maximum. The method will find different multiple SNPs combinatorial patterns between cases and controls based on the new coordinate system. The experiment on simulated data shows that the method is efficient. The tests on the real data of age-related macular degeneration (AMD) disease show that it can find out more significant multi-SNP combinatorial patterns than existing methods.
Collapse
|
5
|
GRAAE LISETTE, PADDOCK SILVIA, BELIN ANDREACARMINE. ReMo-SNPs: a new software tool for identification of polymorphisms in regions and motifs genome-wide. Genet Res (Camb) 2015; 97:e8. [PMID: 25882789 PMCID: PMC6863641 DOI: 10.1017/s0016672315000051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Revised: 02/23/2015] [Accepted: 02/25/2015] [Indexed: 12/13/2022] Open
Abstract
Studies of complex genetic diseases have revealed many risk factors of small effect, but the combined amount of heritability explained is still low. Genome-wide association studies are often underpowered to identify true effects because of the very large number of parallel tests. There is, therefore, a great need to generate data sets that are enriched for those markers that have an increased a priori chance of being functional, such as markers in genomic regions involved in gene regulation. ReMo-SNPs is a computational program developed to aid researchers in the process of selecting functional SNPs for association analyses in user-specified regions and/or motifs genome-wide. The useful feature of automatic selection of genotyped markers in the user-provided material makes the output data ready to be used in a following association study. In this article we describe the program and its functions. We also validate the program by including an example study on three different transcription factors and results from an association study on two psychiatric phenotypes. The flexibility of the ReMo-SNPs program enables the user to study any region or sequence of interest, without limitation to transcription factor binding regions and motifs. The program is freely available at: http://www.neuro.ki.se/ReMo-SNPs/.
Collapse
Affiliation(s)
- LISETTE GRAAE
- Department of Neuroscience, Karolinska Institutet, Retzius väg 8, 171 77 Stockholm
| | | | - ANDREA CARMINE BELIN
- Department of Neuroscience, Karolinska Institutet, Retzius väg 8, 171 77 Stockholm
| |
Collapse
|
6
|
ARYANA: Aligning Reads by Yet Another Approach. BMC Bioinformatics 2014; 15 Suppl 9:S12. [PMID: 25252881 PMCID: PMC4168712 DOI: 10.1186/1471-2105-15-s9-s12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. Contribution We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. Availability ARYANA with complete source code can be obtained from http://github.com/aryana-aligner
Collapse
|
7
|
Teng M, Ichikawa S, Padgett LR, Wang Y, Mort M, Cooper DN, Koller DL, Foroud T, Edenberg HJ, Econs MJ, Liu Y. regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions. ACTA ACUST UNITED AC 2012; 28:1879-86. [PMID: 22611130 PMCID: PMC3389767 DOI: 10.1093/bioinformatics/bts275] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: One of the fundamental questions in genetics study is to identify functional DNA variants that are responsible to a disease or phenotype of interest. Results from large-scale genetics studies, such as genome-wide association studies (GWAS), and the availability of high-throughput sequencing technologies provide opportunities in identifying causal variants. Despite the technical advances, informatics methodologies need to be developed to prioritize thousands of variants for potential causative effects. Results: We present regSNPs, an informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors. When tested by using the disease-causing variants documented in the Human Gene Mutation Database, regSNPs showed mixed performance on various diseases. regSNPs predicted three SNPs that can potentially affect bone density in a region detected in an earlier linkage study. Potential effects of one of the variants were validated using luciferase reporter assay. Contact:yunliu@iupui.edu Supplementary information:Supplementary data are available at Bioinformatics online
Collapse
Affiliation(s)
- Mingxiang Teng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Wang J, Ronaghi M, Chong SS, Lee CGL. pfSNP: An integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses. Hum Mutat 2011; 32:19-24. [PMID: 20672376 DOI: 10.1002/humu.21331] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Currently, >14,000,000 single nucleotide polymorphisms (SNPs) are reported. Identifying phenotype-affecting SNPs among these many SNPs pose significant challenges. Although several Web resources are available that can inform about the functionality of SNPs, these resources are mainly annotation databases and are not very comprehensive. In this article, we present a comprehensive, well-annotated, integrated pfSNP (potentially functional SNPs) Web resource (http://pfs.nus.edu.sg/), which is aimed to facilitate better hypothesis generation through knowledge syntheses mediated by better data integration and a user-friendly Web interface. pfSNP integrates >40 different algorithms/resources to interrogate >14,000,000 SNPs from the dbSNP database for SNPs of potential functional significance based on previous published reports, inferred potential functionality from genetic approaches as well as predicted potential functionality from sequence motifs. Its query interface has the user-friendly "auto-complete, prompt-as-you-type" feature and is highly customizable, facilitating different combination of queries using Boolean-logic. Additionally, to facilitate better understanding of the results and aid in hypotheses generation, gene/pathway-level information with text clouds highlighting enriched tissues/pathways as well as detailed-related information are also provided on the results page. Hence, the pfSNP resource will be of great interest to scientists focusing on association studies as well as those interested to experimentally address the functionality of SNPs.
Collapse
Affiliation(s)
- Jingbo Wang
- Department of Biochemistry Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | |
Collapse
|
9
|
Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 2011; 32:894-9. [PMID: 21520341 PMCID: PMC3145015 DOI: 10.1002/humu.21517] [Citation(s) in RCA: 568] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing nonsynonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one. However, querying predictions from different databases/Web-servers for different algorithms is both tedious and time consuming, especially when dealing with a huge number of NSs identified by exome sequencing. To facilitate the process, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions). It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005). It is the first integrated database of functional predictions from multiple algorithms for the comprehensive collection of human NSs. dbNSFP is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP.
Collapse
Affiliation(s)
- Xiaoming Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.
| | | | | |
Collapse
|
10
|
Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock SJ, Wang J. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 2011; 40:D1047-54. [PMID: 22139925 PMCID: PMC3245026 DOI: 10.1093/nar/gkr1182] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Recent advances in genome-wide association studies (GWAS) have enabled us to identify thousands of genetic variants (GVs) that are associated with human diseases. As next-generation sequencing technologies become less expensive, more GVs will be discovered in the near future. Existing databases, such as NHGRI GWAS Catalog, collect GVs with only genome-wide level significance. However, many true disease susceptibility loci have relatively moderate P values and are not included in these databases. We have developed GWASdb that contains 20 times more data than the GWAS Catalog and includes less significant GVs (P < 1.0 × 10−3) manually curated from the literature. In addition, GWASdb provides comprehensive functional annotations for each GV, including genomic mapping information, regulatory effects (transcription factor binding sites, microRNA target sites and splicing sites), amino acid substitutions, evolution, gene expression and disease associations. Furthermore, GWASdb classifies these GVs according to diseases using Disease-Ontology Lite and Human Phenotype Ontology. It can conduct pathway enrichment and PPI network association analysis for these diseases. GWASdb provides an intuitive, multifunctional database for biologists and clinicians to explore GVs and their functional inferences. It is freely available at http://jjwanglab.org/gwasdb and will be updated frequently.
Collapse
Affiliation(s)
- Mulin Jun Li
- Department of Biochemistry, The University of Hong Kong, Hong Kong SAR, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Taniya T, Tanaka S, Yamaguchi-Kabata Y, Hanaoka H, Yamasaki C, Maekawa H, Barrero RA, Lenhard B, Datta MW, Shimoyama M, Bumgarner R, Chakraborty R, Hopkinson I, Jia L, Hide W, Auffray C, Minoshima S, Imanishi T, Gojobori T. A prioritization analysis of disease association by data-mining of functional annotation of human genes. Genomics 2011; 99:1-9. [PMID: 22019378 DOI: 10.1016/j.ygeno.2011.10.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Revised: 09/16/2011] [Accepted: 10/06/2011] [Indexed: 11/15/2022]
Abstract
Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis.
Collapse
Affiliation(s)
- Takayuki Taniya
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research Building 7F, 2-4-7 Aomi, Tokyo 135-0064, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Benson CC, Zhou Q, Long X, Miano JM. Identifying functional single nucleotide polymorphisms in the human CArGome. Physiol Genomics 2011; 43:1038-48. [PMID: 21771879 DOI: 10.1152/physiolgenomics.00098.2011] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
Collapse
Affiliation(s)
- Craig C Benson
- University of Rochester Medical Center, Rochester, NY, USA
| | | | | | | |
Collapse
|
13
|
Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res 2011; 21:1109-21. [PMID: 21536720 DOI: 10.1101/gr.118992.110] [Citation(s) in RCA: 488] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. In principle, this approach could account even for nonadditive genetic interactions, which underlie the synergistic combinations of mutations often linked to complex diseases. Here, we analyze a large-scale, human gene functional interaction network (dubbed HumanNet). We show that candidate disease genes can be effectively identified by GBA in cross-validated tests using label propagation algorithms related to Google's PageRank. However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. Here, we resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. We observe a significant boost in the power to detect validated candidate genes for Crohn's disease and type 2 diabetes by comparing our predictions to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK-STAT pathway and associated adaptors GRB2/SHC1 in Crohn's disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes, both for candidate gene-based and GWAS-based studies.
Collapse
Affiliation(s)
- Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 262 Seongsanno, Seodaemun-gu, Seoul, Korea.
| | | | | | | | | |
Collapse
|
14
|
Data integration workflow for search of disease driving genes and genetic variants. PLoS One 2011; 6:e18636. [PMID: 21533266 PMCID: PMC3075259 DOI: 10.1371/journal.pone.0018636] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Accepted: 03/11/2011] [Indexed: 11/29/2022] Open
Abstract
Comprehensive characterization of a gene's impact on phenotypes requires knowledge of the context of the gene. To address this issue we introduce a systematic data integration method Candidate Genes and SNPs (CANGES) that links SNP and linkage disequilibrium data to pathway- and protein-protein interaction information. It can be used as a knowledge discovery tool for the search of disease associated causative variants from genome-wide studies as well as to generate new hypotheses on synergistically functioning genes. We demonstrate the utility of CANGES by integrating pathway and protein-protein interaction data to identify putative functional variants for (i) the p53 gene and (ii) three glioblastoma multiforme (GBM) associated risk genes. For the GBM case, we further integrate the CANGES results with clinical and genome-wide data for 209 GBM patients and identify genes having effects on GBM patient survival. Our results show that selecting a focused set of genes can result in information beyond the traditional genome-wide association approaches. Taken together, holistic approach to identify possible interacting genes and SNPs with CANGES provides a means to rapidly identify networks for any set of genes and generate novel hypotheses. CANGES is available in http://csbi.ltdk.helsinki.fi/CANGES/
Collapse
|
15
|
Grady BJ, Ritchie MD. Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis. CURRENT PHARMACOGENOMICS AND PERSONALIZED MEDICINE 2011; 9:41-66. [PMID: 21887206 PMCID: PMC3163263 DOI: 10.2174/187569211794728805] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Research in human genetics and genetic epidemiology has grown significantly over the previous decade, particularly in the field of pharmacogenomics. Pharmacogenomics presents an opportunity for rapid translation of associated genetic polymorphisms into diagnostic measures or tests to guide therapy as part of a move towards personalized medicine. Expansion in genotyping technology has cleared the way for widespread use of whole-genome genotyping in the effort to identify novel biology and new genetic markers associated with pharmacokinetic and pharmacodynamic endpoints. With new technology and methodology regularly becoming available for use in genetic studies, a discussion on the application of such tools becomes necessary. In particular, quality control criteria have evolved with the use of GWAS as we have come to understand potential systematic errors which can be introduced into the data during genotyping. There have been several replicated pharmacogenomic associations, some of which have moved to the clinic to enact change in treatment decisions. These examples of translation illustrate the strength of evidence necessary to successfully and effectively translate a genetic discovery. In this review, the design of pharmacogenomic association studies is examined with the goal of optimizing the impact and utility of this research. Issues of ascertainment, genotyping, quality control, analysis and interpretation are considered.
Collapse
Affiliation(s)
- Benjamin J. Grady
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| | - Marylyn D. Ritchie
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
16
|
Hansen HM, Xiao Y, Rice T, Bracci PM, Wrensch MR, Sison JD, Chang JS, Smirnov IV, Patoka J, Seldin MF, Quesenberry CP, Kelsey KT, Wiencke JK. Fine mapping of chromosome 15q25.1 lung cancer susceptibility in African-Americans. Hum Mol Genet 2010; 19:3652-61. [PMID: 20587604 DOI: 10.1093/hmg/ddq268] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Several genome-wide association studies identified the chr15q25.1 region, which includes three nicotinic cholinergic receptor genes (CHRNA5-B4) and the cell proliferation gene (PSMA4), for its association with lung cancer risk in Caucasians. A haplotype and its tagging single nucleotide polymorphisms (SNPs) encompassing six genes from IREB2 to CHRNB4 were most strongly associated with lung cancer risk (OR = 1.3; P < 10(-20)). In order to narrow the region of association and identify potential causal variations, we performed a fine-mapping study using 77 SNPs in a 194 kb segment of the 15q25.1 region in a sample of 448 African-American lung cancer cases and 611 controls. Four regions, two SNPs and two distinct haplotypes from sliding window analyses, were associated with lung cancer. CHRNA5 rs17486278 G had OR = 1.28, 95% CI 1.07-1.54 and P = 0.008, whereas CHRNB4 rs7178270 G had OR = 0.78, 95% CI 0.66-0.94 and P = 0.008 for lung cancer risk. Lung cancer associations remained significant after pack-year adjustment. Rs7178270 decreased lung cancer risk in women but not in men; gender interaction P = 0.009. For two SNPs (rs7168796 A/G and rs7164594 A/G) upstream of PSMA4, lung cancer risks for people with haplotypes GG and AA were reduced compared with those with AG (OR = 0.56, 95% CI 0.38-0.82; P = 0.003 and OR = 0.73, 95% CI 0.59-0.90, P = 0.004, respectively). A four-SNP haplotype spanning CHRNA5 (rs11637635 C, rs17408276 T, rs16969968 G) and CHRNA3 (rs578776 G) was associated with increased lung cancer risk (P = 0.002). The identified regions contain SNPs predicted to affect gene regulation. There are multiple lung cancer risk loci in the 15q25.1 region in African-Americans.
Collapse
Affiliation(s)
- Helen M Hansen
- Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Chang HW, Cheng YH, Chuang LY, Yang CH. SNP-RFLPing 2: an updated and integrated PCR-RFLP tool for SNP genotyping. BMC Bioinformatics 2010; 11:173. [PMID: 20377871 PMCID: PMC2858040 DOI: 10.1186/1471-2105-11-173] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2009] [Accepted: 04/08/2010] [Indexed: 11/10/2022] Open
Abstract
Background PCR-restriction fragment length polymorphism (RFLP) assay is a cost-effective method for SNP genotyping and mutation detection, but the manual mining for restriction enzyme sites is challenging and cumbersome. Three years after we constructed SNP-RFLPing, a freely accessible database and analysis tool for restriction enzyme mining of SNPs, significant improvements over the 2006 version have been made and incorporated into the latest version, SNP-RFLPing 2. Results The primary aim of SNP-RFLPing 2 is to provide comprehensive PCR-RFLP information with multiple functionality about SNPs, such as SNP retrieval to multiple species, different polymorphism types (bi-allelic, tri-allelic, tetra-allelic or indels), gene-centric searching, HapMap tagSNPs, gene ontology-based searching, miRNAs, and SNP500Cancer. The RFLP restriction enzymes and the corresponding PCR primers for the natural and mutagenic types of each SNP are simultaneously analyzed. All the RFLP restriction enzyme prices are also provided to aid selection. Furthermore, the previously encountered updating problems for most SNP related databases are resolved by an on-line retrieval system. Conclusions The user interfaces for functional SNP analyses have been substantially improved and integrated. SNP-RFLPing 2 offers a new and user-friendly interface for RFLP genotyping that can be used in association studies and is freely available at http://bio.kuas.edu.tw/snp-rflping2.
Collapse
Affiliation(s)
- Hsueh-Wei Chang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
| | | | | | | |
Collapse
|
18
|
Josephy PD, Kent M, Mannervik B. Single-nucleotide polymorphic variants of human glutathione transferase T1-1 differ in stability and functional properties. Arch Biochem Biophys 2009; 490:24-9. [DOI: 10.1016/j.abb.2009.07.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2009] [Revised: 07/30/2009] [Accepted: 07/31/2009] [Indexed: 02/07/2023]
|
19
|
Wrensch M, Jenkins RB, Chang JS, Yeh RF, Xiao Y, Decker PA, Ballman KV, Berger M, Buckner JC, Chang S, Giannini C, Halder C, Kollmeyer TM, Kosel ML, LaChance DH, McCoy L, O'Neill BP, Patoka J, Pico AR, Prados M, Quesenberry C, Rice T, Rynearson AL, Smirnov I, Tihan T, Wiemels J, Yang P, Wiencke JK. Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility. Nat Genet 2009; 41:905-8. [PMID: 19578366 PMCID: PMC2923561 DOI: 10.1038/ng.408] [Citation(s) in RCA: 407] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 06/01/2009] [Indexed: 12/27/2022]
Abstract
The causes of glioblastoma and other gliomas remain obscure. To discover new candidate genes influencing glioma susceptibility, we conducted a principal component-adjusted genome-wide association study (GWAS) of 275,895 autosomal variants among 692 adult high-grade glioma cases (622 from the San Francisco Adult Glioma Study (AGS) and 70 from the Cancer Genome Atlas (TCGA)) and 3,992 controls (602 from AGS and 3,390 from Illumina iControlDB (iControls)). For replication, we analyzed the 13 SNPs with P < 10(-6) using independent data from 176 high-grade glioma cases and 174 controls from the Mayo Clinic. On 9p21, rs1412829 near CDKN2B had discovery P = 3.4 x 10(-8), replication P = 0.0038 and combined P = 1.85 x 10(-10). On 20q13.3, rs6010620 intronic to RTEL1 had discovery P = 1.5 x 10(-7), replication P = 0.00035 and combined P = 3.40 x 10(-9). For both SNPs, the direction of association was the same in discovery and replication phases.
Collapse
Affiliation(s)
- Margaret Wrensch
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Analytical methods for inferring functional effects of single base pair substitutions in human cancers. Hum Genet 2009; 126:481-98. [PMID: 19434427 PMCID: PMC2762536 DOI: 10.1007/s00439-009-0677-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 04/29/2009] [Indexed: 02/08/2023]
Abstract
Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.
Collapse
|
21
|
Hong MG, Pawitan Y, Magnusson PKE, Prince JA. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum Genet 2009; 126:289-301. [PMID: 19408013 DOI: 10.1007/s00439-009-0676-z] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2009] [Accepted: 04/21/2009] [Indexed: 01/15/2023]
Abstract
A fundamental question in human genetics is the degree to which the polygenic character of complex traits derives from polymorphism in genes with similar or with dissimilar functions. The many genome-wide association studies now being performed offer an opportunity to investigate this, and although early attempts are emerging, new tools and modeling strategies still need to be developed and deployed. Towards this goal, we implemented a new algorithm to facilitate the transition from genetic marker lists (principally those generated by PLINK) to pathway analyses of representational gene sets in either threshold or threshold-free downstream applications (e.g. DAVID, GSEA-P, and Ingenuity Pathway Analysis). This was applied to several large genome-wide association studies covering diverse human traits that included type 2 diabetes, Crohn's disease, and plasma lipid levels. Validation of this approach was obtained for plasma HDL levels, where functional categories related to lipid metabolism emerged as the most significant in two independent studies. From analyses of these samples, we highlight and address numerous issues related to this strategy, including appropriate gene based correction statistics, the utility of imputed versus non-imputed marker sets, and the apparent enrichment of pathways due solely to the positional clustering of functionally related genes. The latter in particular emphasizes the importance of studies that directly tie genetic variation to functional characteristics of specific genes. The software freely provided that we have called ProxyGeneLD may resolve an important bottleneck in pathway-based analyses of genome-wide association data. This has allowed us to identify at least one replicable case of pathway enrichment but also to highlight functional gene clustering as a potentially serious problem that may lead to spurious pathway findings if not corrected.
Collapse
Affiliation(s)
- Mun-Gwan Hong
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | | | | | | |
Collapse
|