1
|
Miller M, Vitale D, Kahn PC, Rost B, Bromberg Y. funtrp: identifying protein positions for variation driven functional tuning. Nucleic Acids Res 2020; 47:e142. [PMID: 31584091 PMCID: PMC6868392 DOI: 10.1093/nar/gkz818] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/12/2022] Open
Abstract
Evaluating the impact of non-synonymous genetic variants is essential for uncovering disease associations and mechanisms of evolution. An in-depth understanding of sequence changes is also fundamental for synthetic protein design and stability assessments. However, the variant effect predictor performance gain observed in recent years has not kept up with the increased complexity of new methods. One likely reason for this might be that most approaches use similar sets of gene and protein features for modeling variant effects, often emphasizing sequence conservation. While high levels of conservation highlight residues essential for protein activity, much of the variation observable in vivo is arguably weaker in its impact, thus requiring evaluation at a higher level of resolution. Here, we describe functionNeutral/Toggle/Rheostatpredictor (funtrp), a novel computational method that categorizes protein positions based on the position-specific expected range of mutational impacts: Neutral (weak/no effects), Rheostat (function-tuning positions), or Toggle (on/off switches). We show that position types do not correlate strongly with familiar protein features such as conservation or protein disorder. We also find that position type distribution varies across different protein functions. Finally, we demonstrate that position types can improve performance of existing variant effect predictors and suggest a way forward for the development of new ones.
Collapse
Affiliation(s)
- Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA
| | - Daniel Vitale
- Columbian College of Arts and Sciences Data Science Program Corcoran Hall, 725 21st Street NW, Washington, DC 20052, USA
| | - Peter C Kahn
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA
| | - Burkhard Rost
- Department for Bioinformatics and Computational Biology, Technische Universität München, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,Institute for Advanced Study at Technische Universität München (TUM-IAS), Lichtenbergstraße 2a 85748 Garching/Munich, Germany
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA.,Institute for Advanced Study at Technische Universität München (TUM-IAS), Lichtenbergstraße 2a 85748 Garching/Munich, Germany.,Department of Genetics, Rutgers University, Human Genetics Institute, Life Sciences Building, 145 Bevier Road, Piscataway, NJ 08854, USA
| |
Collapse
|
2
|
Gambhir SS, Ge TJ, Vermesh O, Spitler R. Toward achieving precision health. Sci Transl Med 2019; 10:10/430/eaao3612. [PMID: 29491186 DOI: 10.1126/scitranslmed.aao3612] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 02/08/2018] [Indexed: 01/19/2023]
Abstract
Health care systems primarily focus on patients after they present with disease, not before. The emerging field of precision health encourages disease prevention and earlier detection by monitoring health and disease based on an individual's risk. Active participation in health care can be encouraged with continuous health-monitoring devices, providing a higher-resolution picture of human health and disease. However, the development of monitoring technologies must prioritize the collection of actionable data and long-term user engagement.
Collapse
Affiliation(s)
- Sanjiv Sam Gambhir
- Department of Radiology, Molecular Imaging Program at Stanford, Stanford University School of Medicine, Stanford, CA 94305, USA. .,Canary Center at Stanford for Cancer Early Detection, Stanford University School of Medicine, Palo Alto, CA 94304, USA.,Canary Center at Stanford for Cancer Early Detection, Stanford University School of Medicine, Palo Alto, CA 94304, USA.,Department of Bioengineering and Department of Materials Science and Engineering, Stanford University, Stanford, CA 94305, USA
| | - T Jessie Ge
- Department of Radiology, Molecular Imaging Program at Stanford, Stanford University School of Medicine, Stanford, CA 94305, USA.,Precision Health and Integrated Diagnostics Center, Stanford University, Stanford, CA 94305, USA
| | - Ophir Vermesh
- Department of Radiology, Molecular Imaging Program at Stanford, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ryan Spitler
- Department of Radiology, Molecular Imaging Program at Stanford, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Bioengineering and Department of Materials Science and Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Martell HJ, Wong KA, Martin JF, Kassam Z, Thomas K, Wass MN. Associating mutations causing cystinuria with disease severity with the aim of providing precision medicine. BMC Genomics 2017; 18:550. [PMID: 28812535 PMCID: PMC5558187 DOI: 10.1186/s12864-017-3913-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Cystinuria is an inherited disease that results in the formation of cystine stones in the kidney, which can have serious health complications. Two genes (SLC7A9 and SLC3A1) that form an amino acid transporter are known to be responsible for the disease. Variants that cause the disease disrupt amino acid transport across the cell membrane, leading to the build-up of relatively insoluble cystine, resulting in formation of stones. Assessing the effects of each mutation is critical in order to provide tailored treatment options for patients. We used various computational methods to assess the effects of cystinuria associated mutations, utilising information on protein function, evolutionary conservation and natural population variation of the two genes. We also analysed the ability of some methods to predict the phenotypes of individuals with cystinuria, based on their genotypes, and compared this to clinical data. Results Using a literature search, we collated a set of 94 SLC3A1 and 58 SLC7A9 point mutations known to be associated with cystinuria. There are differences in sequence location, evolutionary conservation, allele frequency, and predicted effect on protein function between these mutations and other genetic variants of the same genes that occur in a large population. Structural analysis considered how these mutations might lead to cystinuria. For SLC7A9, many mutations swap hydrophobic amino acids for charged amino acids or vice versa, while others affect known functional sites. For SLC3A1, functional information is currently insufficient to make confident predictions but mutations often result in the loss of hydrogen bonds and largely appear to affect protein stability. Finally, we showed that computational predictions of mutation severity were significantly correlated with the disease phenotypes of patients from a clinical study, despite different methods disagreeing for some of their predictions. Conclusions The results of this study are promising and highlight the areas of research which must now be pursued to better understand how mutations in SLC3A1 and SLC7A9 cause cystinuria. The application of our approach to a larger data set is essential, but we have shown that computational methods could play an important role in designing more effective personalised treatment options for patients with cystinuria. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3913-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Henry J Martell
- School of Biosciences, University of Kent, Canterbury, Kent, CT2 7NJ, UK
| | - Kathie A Wong
- Urology Centre, Guy's and St. Thomas' NHS Foundation Trust, London, SE1 9RT, UK
| | - Juan F Martin
- School of Biosciences, University of Kent, Canterbury, Kent, CT2 7NJ, UK
| | - Ziyan Kassam
- Urology Centre, Guy's and St. Thomas' NHS Foundation Trust, London, SE1 9RT, UK
| | - Kay Thomas
- Urology Centre, Guy's and St. Thomas' NHS Foundation Trust, London, SE1 9RT, UK.
| | - Mark N Wass
- School of Biosciences, University of Kent, Canterbury, Kent, CT2 7NJ, UK.
| |
Collapse
|
4
|
Computational predictors fail to identify amino acid substitution effects at rheostat positions. Sci Rep 2017; 7:41329. [PMID: 28134345 PMCID: PMC5278360 DOI: 10.1038/srep41329] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 12/15/2016] [Indexed: 12/31/2022] Open
Abstract
Many computational approaches exist for predicting the effects of amino acid substitutions. Here, we considered whether the protein sequence position class - rheostat or toggle - affects these predictions. The classes are defined as follows: experimentally evaluated effects of amino acid substitutions at toggle positions are binary, while rheostat positions show progressive changes. For substitutions in the LacI protein, all evaluated methods failed two key expectations: toggle neutrals were incorrectly predicted as more non-neutral than rheostat non-neutrals, while toggle and rheostat neutrals were incorrectly predicted to be different. However, toggle non-neutrals were distinct from rheostat neutrals. Since many toggle positions are conserved, and most rheostats are not, predictors appear to annotate position conservation better than mutational effect. This finding can explain the well-known observation that predictors assign disproportionate weight to conservation, as well as the field's inability to improve predictor performance. Thus, building reliable predictors requires distinguishing between rheostat and toggle positions.
Collapse
|
5
|
Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016; 590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Collapse
Affiliation(s)
- Burkhard Rost
- Department of Informatics and Bioinformatics, Institute for Advanced Studies, Technical University of Munich, Garching, Germany
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
6
|
Bromberg Y, Capriotti E, Carter H. VarI-SIG 2015: methods for personalized medicine - the role of variant interpretation in research and diagnostics. BMC Genomics 2016; 17 Suppl 2:425. [PMID: 27357578 PMCID: PMC4928159 DOI: 10.1186/s12864-016-2721-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, Lipman Hall 218, 08901, New Brunswick, NJ, USA. .,Department of Genetics, Rutgers University, Lipman Hall 218, 08901, New Brunswick, NJ, USA.
| | - Emidio Capriotti
- Institute for Mathematical Modeling of Biological Systems, Department of Biology, Heinrich Heine University Düsseldorf, Universitaetsstr. 1, 40225, Düsseldorf, Germany.
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Dr., 92093, La Jolla, CA, USA.
| |
Collapse
|
7
|
Genomic Copy Number Variation Affecting Genes Involved in the Cell Cycle Pathway: Implications for Somatic Mosaicism. Int J Genomics 2015; 2015:757680. [PMID: 26421275 PMCID: PMC4569762 DOI: 10.1155/2015/757680] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 07/27/2015] [Indexed: 12/20/2022] Open
Abstract
Somatic genome variations (mosaicism) seem to represent a common mechanism for human intercellular/interindividual diversity in health and disease. However, origins and mechanisms of somatic mosaicism remain a matter of conjecture. Recently, it has been hypothesized that zygotic genomic variation naturally occurring in humans is likely to predispose to nonheritable genetic changes (aneuploidy) acquired during the lifetime through affecting cell cycle regulation, genome stability maintenance, and related pathways. Here, we have evaluated genomic copy number variation (CNV) in genes implicated in the cell cycle pathway (according to Kyoto Encyclopedia of Genes and Genomes/KEGG) within a cohort of patients with intellectual disability, autism, and/or epilepsy, in which the phenotype was not associated with genomic rearrangements altering this pathway. Benign CNVs affecting 20 genes of the cell cycle pathway were detected in 161 out of 255 patients (71.6%). Among them, 62 individuals exhibited >2 CNVs affecting the cell cycle pathway. Taking into account the number of individuals demonstrating CNV of these genes, a support for this hypothesis appears to be presented. Accordingly, we speculate that further studies of CNV burden across the genes implicated in related pathways might clarify whether zygotic genomic variation generates somatic mosaicism in health and disease.
Collapse
|
8
|
MaPSeq, A Service-Oriented Architecture for Genomics Research within an Academic Biomedical Research Institution. INFORMATICS 2015. [DOI: 10.3390/informatics2030020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
9
|
Chung JH, Cai J, Suskin BG, Zhang Z, Coleman K, Morrow BE. Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations. Hum Mutat 2015; 36:797-807. [PMID: 25981510 DOI: 10.1002/humu.22814] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/01/2015] [Indexed: 12/20/2022]
Abstract
The 22q11.2 deletion syndrome (22q11DS) affects 1:4,000 live births and presents with highly variable phenotype expressivity. In this study, we developed an analytical approach utilizing whole-genome sequencing (WGS) and integrative analysis to discover genetic modifiers. Our pipeline combined available tools in order to prioritize rare, predicted deleterious, coding and noncoding single-nucleotide variants (SNVs), and insertion/deletions from WGS. We sequenced two unrelated probands with 22q11DS, with contrasting clinical findings, and their unaffected parents. Proband P1 had cognitive impairment, psychotic episodes, anxiety, and tetralogy of Fallot (TOF), whereas proband P2 had juvenile rheumatoid arthritis but no other major clinical findings. In P1, we identified common variants in COMT and PRODH on 22q11.2 as well as rare potentially deleterious DNA variants in other behavioral/neurocognitive genes. We also identified a de novo SNV in ADNP2 (NM_014913.3:c.2243G>C), encoding a neuroprotective protein that may be involved in behavioral disorders. In P2, we identified a novel nonsynonymous SNV in ZFPM2 (NM_012082.3:c.1576C>T), a known causative gene for TOF, which may act as a protective variant downstream of TBX1, haploinsufficiency of which is responsible for congenital heart disease in individuals with 22q11DS.
Collapse
Affiliation(s)
- Jonathan H Chung
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| | - Jinlu Cai
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Barrie G Suskin
- Department of Obstetrics & Gynecology and Women's Health, Montefiore Medical Center, Bronx, New York
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| | - Karlene Coleman
- Children's Healthcare of Atlanta at Egleston, Atlanta, Georgia
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
10
|
Bromberg Y, Capriotti E. VarI-SIG 2014--From SNPs to variants: interpreting different types of genetic variants. BMC Genomics 2015; 16 Suppl 8:I1. [PMID: 26110281 PMCID: PMC4480323 DOI: 10.1186/1471-2164-16-s8-i1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
|
11
|
Pielaat A, Boer MP, Wijnands LM, van Hoek AHAM, Bouw E, Barker GC, Teunis PFM, Aarts HJM, Franz E. First step in using molecular data for microbial food safety risk assessment; hazard identification of Escherichia coli O157:H7 by coupling genomic data with in vitro adherence to human epithelial cells. Int J Food Microbiol 2015; 213:130-8. [PMID: 25910947 PMCID: PMC4613885 DOI: 10.1016/j.ijfoodmicro.2015.04.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 03/31/2015] [Accepted: 04/03/2015] [Indexed: 12/11/2022]
Abstract
The potential for using whole genome sequencing (WGS) data in microbiological risk assessment (MRA) has been discussed on several occasions since the beginning of this century. Still, the proposed heuristic approaches have never been applied in a practical framework. This is due to the non-trivial problem of mapping microbial information consisting of thousands of loci onto a probabilistic scale for risks. The paradigm change for MRA involves translation of multidimensional microbial genotypic information to much reduced (integrated) phenotypic information and onwards to a single measure of human risk (i.e. probability of illness). In this paper a first approach in methodology development is described for the application of WGS data in MRA; this is supported by a practical example. That is, combining genetic data (single nucleotide polymorphisms; SNPs) for Shiga toxin-producing Escherichia coli (STEC) O157 with phenotypic data (in vitro adherence to epithelial cells as a proxy for virulence) leads to hazard identification in a Genome Wide Association Study (GWAS). This application revealed practical implications when using SNP data for MRA. These can be summarized by considering the following main issues: optimum sample size for valid inference on population level, correction for population structure, quantification and calibration of results, reproducibility of the analysis, links with epidemiological data, anchoring and integration of results into a systems biology approach for the translation of molecular studies to human health risk. Future developments in genetic data analysis for MRA should aim at resolving the mapping problem of processing genetic sequences to come to a quantitative description of risk. The development of a clustering scheme focusing on biologically relevant information of the microbe involved would be a useful approach in molecular data reduction for risk assessment.
Collapse
Affiliation(s)
- Annemarie Pielaat
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands.
| | - Martin P Boer
- Wageningen UR Biometris, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Lucas M Wijnands
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands
| | - Angela H A M van Hoek
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands
| | - El Bouw
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands
| | - Gary C Barker
- IFR, Institute of Food Research, Norwich Research Park, Norwich, UK
| | - Peter F M Teunis
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands; Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Henk J M Aarts
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands
| | - Eelco Franz
- National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, A. van Leeuwenhoeklaan 9, 3720 BA Bilthoven, The Netherlands
| |
Collapse
|
12
|
Katsonis P, Koire A, Wilson SJ, Hsu TK, Lua RC, Wilkins AD, Lichtarge O. Single nucleotide variations: biological impact and theoretical interpretation. Protein Sci 2014; 23:1650-66. [PMID: 25234433 PMCID: PMC4253807 DOI: 10.1002/pro.2552] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Revised: 09/12/2014] [Accepted: 09/15/2014] [Indexed: 12/27/2022]
Abstract
Genome-wide association studies (GWAS) and whole-exome sequencing (WES) generate massive amounts of genomic variant information, and a major challenge is to identify which variations drive disease or contribute to phenotypic traits. Because the majority of known disease-causing mutations are exonic non-synonymous single nucleotide variations (nsSNVs), most studies focus on whether these nsSNVs affect protein function. Computational studies show that the impact of nsSNVs on protein function reflects sequence homology and structural information and predict the impact through statistical methods, machine learning techniques, or models of protein evolution. Here, we review impact prediction methods and discuss their underlying principles, their advantages and limitations, and how they compare to and complement one another. Finally, we present current applications and future directions for these methods in biological research and medical genetics.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Amanda Koire
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
| | - Stephen Joseph Wilson
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Teng-Kuei Hsu
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Angela Dawn Wilkins
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
- Department of Pharmacology, Baylor College of MedicineHouston, Texas
| |
Collapse
|
13
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
14
|
Haas J, Barb I, Katus HA, Meder B. Targeted next-generation sequencing: the clinician's stethoscope for genetic disorders. Per Med 2014; 11:581-592. [PMID: 29758803 DOI: 10.2217/pme.14.40] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Genetic biomarkers are crucial for diagnosis, guiding of treatments and estimation of prognosis. In the past, clinical genetic diagnostics was limited by the sequencing information gained from selected exons and single genes. For genetically heterogeneous diseases, such as cardiomyopathies, where underlying mutations in more than 1000 exons are known, a Sanger-based comprehensive test would have been extremely expensive and labor intensive. Next-generation sequencing has overcome these problems in terms of costs, speed and throughput. In this review we discuss available methods for targeted next-generation sequencing that ease the introduction of this technology into routine clinical application. We further provide results of a study we have performed to compare two state-of-the-art methods for their enrichment efficiency and detection accuracy of variants in a clinical setting.
Collapse
Affiliation(s)
- Jan Haas
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Germany
| | - Ioana Barb
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Germany
| | - Hugo A Katus
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Germany
| | - Benjamin Meder
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Germany
| |
Collapse
|
15
|
Yang J, Wang Y, Shen H, Yang W. In silico identification and experimental validation of insertion-deletion polymorphisms in tomato genome. DNA Res 2014; 21:429-38. [PMID: 24618211 PMCID: PMC4131836 DOI: 10.1093/dnares/dsu008] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 02/04/2014] [Indexed: 11/14/2022] Open
Abstract
Comparative analysis of the genome sequences of Solanum lycopersicum variety Heinz 1706 and S. pimpinellifolium accession LA 1589 using MUGSY software identified 145 695 insertion-deletion (InDel) polymorphisms. A selected set of 3029 candidate InDels (≥2 bp) across the entire tomato genome were subjected to PCR validation, and 82.4% could be verified. Of 2272 polymorphic InDels between LA 1589 and Heinz 1706, 61.6, 45.2, and 31.6% were polymorphic in 8 accessions of S. pimpinellifolium, 4 accessions of S. lycopersicum var. cerasiforme, and 10 varieties of S. lycopersicum, respectively. Genetic distance was 0.216 in S. pimpinellifolium, 0.202 in S. lycopersicum var. cerasiforme, and 0.108 in S. lycopersicum. The data suggested a reduction of genetic variation from S. pimpinellifolium to S. lycopersicum var. cerasiforme and S. lycopersicum. Cluster analysis showed that the 8 accessions of S. pimpinellifolium were in one group, whereas 4 accessions of S. lycopersicum var. cerasiforme and 10 varieties of S. lycopersicum were in the same group.
Collapse
Affiliation(s)
- Jingjing Yang
- Beijing Key Laboratory of Growth and Developmental Regulation for Protected Vegetable Crops, Department of Vegetable Science, China Agricultural University, No. 2 Yuanmingyuan Xilu, Beijing 100193, China
| | - Yuanyuan Wang
- Beijing Key Laboratory of Growth and Developmental Regulation for Protected Vegetable Crops, Department of Vegetable Science, China Agricultural University, No. 2 Yuanmingyuan Xilu, Beijing 100193, China
| | - Huolin Shen
- Beijing Key Laboratory of Growth and Developmental Regulation for Protected Vegetable Crops, Department of Vegetable Science, China Agricultural University, No. 2 Yuanmingyuan Xilu, Beijing 100193, China
| | - Wencai Yang
- Beijing Key Laboratory of Growth and Developmental Regulation for Protected Vegetable Crops, Department of Vegetable Science, China Agricultural University, No. 2 Yuanmingyuan Xilu, Beijing 100193, China
| |
Collapse
|
16
|
Advances in Human Biology: Combining Genetics and Molecular Biophysics to Pave the Way for Personalized Diagnostics and Medicine. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/471836] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Advances in several biology-oriented initiatives such as genome sequencing and structural genomics, along with the progress made through traditional biological and biochemical research, have opened up a unique opportunity to better understand the molecular effects of human diseases. Human DNA can vary significantly from person to person and determines an individual’s physical characteristics and their susceptibility to diseases. Armed with an individual’s DNA sequence, researchers and physicians can check for defects known to be associated with certain diseases by utilizing various databases. However, for unclassified DNA mutations or in order to reveal molecular mechanism behind the effects, the mutations have to be mapped onto the corresponding networks and macromolecular structures and then analyzed to reveal their effect on the wild type properties of biological processes involved. Predicting the effect of DNA mutations on individual’s health is typically referred to as personalized or companion diagnostics. Furthermore, once the molecular mechanism of the mutations is revealed, the patient should be given drugs which are the most appropriate for the individual genome, referred to as pharmacogenomics. Altogether, the shift in focus in medicine towards more genomic-oriented practices is the foundation of personalized medicine. The progress made in these rapidly developing fields is outlined.
Collapse
|
17
|
Li MJ, Yan B, Sham PC, Wang J. Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Brief Bioinform 2014; 16:393-412. [PMID: 24916300 DOI: 10.1093/bib/bbu018] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/23/2014] [Indexed: 12/13/2022] Open
Abstract
Understanding the genetic basis of human traits/diseases and the underlying mechanisms of how these traits/diseases are affected by genetic variations is critical for public health. Current genome-wide functional genomics data uncovered a large number of functional elements in the noncoding regions of human genome, providing new opportunities to study regulatory variants (RVs). RVs play important roles in transcription factor bindings, chromatin states and epigenetic modifications. Here, we systematically review an array of methods currently used to map RVs as well as the computational approaches in annotating and interpreting their regulatory effects, with emphasis on regulatory single-nucleotide polymorphism. We also briefly introduce experimental methods to validate these functional RVs.
Collapse
|
18
|
Abstract
The bioinformatics requirements within the clinical environment are very specific, and analytic techniques need to be fit for purpose, robust, and predictable. At the same time, the bewildering amount of information produced during these analyses needs to be carefully managed, used and interpreted correctly. The challenge for clinical laboratories now is to implement production analytical processes that are capable of handling different experimental approaches on current equipment, as well as to incorporate ways for these systems to evolve to take account of developments likely to make impacts in the near future. This is complicated by the many options available at each of the critical processing steps and a clear method needs to be developed to assemble appropriate pipelines. Here, I discuss the issues relevant to the development of an informatics pipeline that meets these criteria that should allow individual laboratories to assess their proposed strategies.
Collapse
Affiliation(s)
- Richard James Nigel Allcock
- School of Pathology and Laboratory Medicine, University of Western Australia, M574 Stirling Highway, Nedlands, WA, 6009, Australia,
| |
Collapse
|
19
|
Wang S, Xing J. A primer for disease gene prioritization using next-generation sequencing data. Genomics Inform 2013; 11:191-9. [PMID: 24465230 PMCID: PMC3897846 DOI: 10.5808/gi.2013.11.4.191] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Revised: 11/18/2013] [Accepted: 11/21/2013] [Indexed: 01/21/2023] Open
Abstract
High-throughput next-generation sequencing (NGS) technology produces a tremendous amount of raw sequence data. The challenges for researchers are to process the raw data, to map the sequences to genome, to discover variants that are different from the reference genome, and to prioritize/rank the variants for the question of interest. The recent development of many computational algorithms and programs has vastly improved the ability to translate sequence data into valuable information for disease gene identification. However, the NGS data analysis is complex and could be overwhelming for researchers who are not familiar with the process. Here, we outline the analysis pipeline and describe some of the most commonly used principles and tools for analyzing NGS data for disease gene identification.
Collapse
Affiliation(s)
- Shuoguo Wang
- Department of Genetics, The State University of New Jersey, Piscataway, NJ 08854, USA. ; Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jinchuan Xing
- Department of Genetics, The State University of New Jersey, Piscataway, NJ 08854, USA. ; Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
20
|
Alexov E, Sternberg M. Understanding molecular effects of naturally occurring genetic differences. J Mol Biol 2013; 425:3911-3. [PMID: 23968859 DOI: 10.1016/j.jmb.2013.08.013] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Emil Alexov
- Department of Physics, Clemson University, Clemson, SC 29634, USA.
| | | |
Collapse
|
21
|
Garaffo G, Provero P, Molineris I, Pinciroli P, Peano C, Battaglia C, Tomaiuolo D, Etzion T, Gothilf Y, Santoro M, Merlo GR. Profiling, Bioinformatic, and Functional Data on the Developing Olfactory/GnRH System Reveal Cellular and Molecular Pathways Essential for This Process and Potentially Relevant for the Kallmann Syndrome. Front Endocrinol (Lausanne) 2013; 4:203. [PMID: 24427155 PMCID: PMC3876029 DOI: 10.3389/fendo.2013.00203] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 12/18/2013] [Indexed: 11/28/2022] Open
Abstract
During embryonic development, immature neurons in the olfactory epithelium (OE) extend axons through the nasal mesenchyme, to contact projection neurons in the olfactory bulb. Axon navigation is accompanied by migration of the GnRH+ neurons, which enter the anterior forebrain and home in the septo-hypothalamic area. This process can be interrupted at various points and lead to the onset of the Kallmann syndrome (KS), a disorder characterized by anosmia and central hypogonadotropic hypogonadism. Several genes has been identified in human and mice that cause KS or a KS-like phenotype. In mice a set of transcription factors appears to be required for olfactory connectivity and GnRH neuron migration; thus we explored the transcriptional network underlying this developmental process by profiling the OE and the adjacent mesenchyme at three embryonic ages. We also profiled the OE from embryos null for Dlx5, a homeogene that causes a KS-like phenotype when deleted. We identified 20 interesting genes belonging to the following categories: (1) transmembrane adhesion/receptor, (2) axon-glia interaction, (3) scaffold/adapter for signaling, (4) synaptic proteins. We tested some of them in zebrafish embryos: the depletion of five (of six) Dlx5 targets affected axonal extension and targeting, while three (of three) affected GnRH neuron position and neurite organization. Thus, we confirmed the importance of cell-cell and cell-matrix interactions and identified new molecules needed for olfactory connection and GnRH neuron migration. Using available and newly generated data, we predicted/prioritized putative KS-disease genes, by building conserved co-expression networks with all known disease genes in human and mouse. The results show the overall validity of approaches based on high-throughput data and predictive bioinformatics to identify genes potentially relevant for the molecular pathogenesis of KS. A number of candidate will be discussed, that should be tested in future mutation screens.
Collapse
Affiliation(s)
- Giulia Garaffo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Paolo Provero
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Ivan Molineris
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Patrizia Pinciroli
- Department of Medical Biotechnology Translational Medicine (BIOMETRA), University of Milano, Milano, Italy
| | - Clelia Peano
- Institute of Biomedical Technology, National Research Council, ITB-CNR, Segrate, Italy
| | - Cristina Battaglia
- Department of Medical Biotechnology Translational Medicine (BIOMETRA), University of Milano, Milano, Italy
- Institute of Biomedical Technology, National Research Council, ITB-CNR, Segrate, Italy
| | - Daniela Tomaiuolo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Talya Etzion
- The George S. Wise Faculty of Life Sciences, Department of Neurobiology, Tel-Aviv University, Tel-Aviv, Israel
| | - Yoav Gothilf
- The George S. Wise Faculty of Life Sciences, Department of Neurobiology, Tel-Aviv University, Tel-Aviv, Israel
| | - Massimo Santoro
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
| | - Giorgio R. Merlo
- Department of Molecular Biotechnology and Health Science, University of Torino, Torino, Italy
- *Correspondence: Giorgio R. Merlo, Department of Molecular Biotechnology and Health Science, University of Torino, Via Nizza 52, Torino 10126, Italy e-mail:
| |
Collapse
|