1
|
Douville C, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, Karchin R. Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel). Hum Mutat 2016; 37:28-35. [PMID: 26442818 PMCID: PMC5057310 DOI: 10.1002/humu.22911] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 09/14/2015] [Indexed: 12/11/2022]
Abstract
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.
Collapse
Affiliation(s)
- Christopher Douville
- Department of Biomedical Engineering and Institute for Computational MedicineThe Johns Hopkins UniversityBaltimoreMaryland
| | - David L. Masica
- Department of Biomedical Engineering and Institute for Computational MedicineThe Johns Hopkins UniversityBaltimoreMaryland
| | - Peter D. Stenson
- Institute of Medical GeneticsSchool of MedicineCardiff UniversityHeath ParkCardiffUK
| | - David N. Cooper
- Institute of Medical GeneticsSchool of MedicineCardiff UniversityHeath ParkCardiffUK
| | | | - Rick Kim
- In Silico SolutionsFairfaxVirginia
| | | | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational MedicineThe Johns Hopkins UniversityBaltimoreMaryland
- Department of OncologyJohns Hopkins University School of MedicineBaltimoreMaryland
| |
Collapse
|
2
|
Analyzing effects of naturally occurring missense mutations. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2012; 2012:805827. [PMID: 22577471 PMCID: PMC3346971 DOI: 10.1155/2012/805827] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/01/2012] [Accepted: 02/01/2012] [Indexed: 11/17/2022]
Abstract
Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well.
Collapse
|
3
|
McClendon CL, Hua L, Barreiro A, Jacobson MP. Comparing Conformational Ensembles Using the Kullback-Leibler Divergence Expansion. J Chem Theory Comput 2012; 8:2115-2126. [PMID: 23316121 DOI: 10.1021/ct300008d] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present a thermodynamical approach to identify changes in macromolecular structure and dynamics in response to perturbations such as mutations or ligand binding, using an expansion of the Kullback-Leibler Divergence that connects local population shifts in torsion angles to changes in the free energy landscape of the protein. While the Kullback-Leibler Divergence is a known formula from information theory, the novelty and power of our implementation lies in its formal developments, connection to thermodynamics, statistical filtering, ease of visualization of results, and extendability by adding higher-order terms. We present a formal derivation of the Kullback-Leibler Divergence expansion and then apply our method at a first-order approximation to molecular dynamics simulations of four protein systems where ligand binding or pH titration is known to cause an effect at a distant site. Our results qualitatively agree with experimental measurements of local changes in structure or dynamics, such as NMR chemical shift perturbations and hydrogen-deuterium exchange mass spectrometry. The approach produces easy-to-analyze results with low background, and as such has the potential to become a routine analysis when molecular dynamics simulations in two or more conditions are available. Our method is implemented in the MutInf code package and is available on the SimTK website at https://simtk.org/home/mutinf.
Collapse
|
4
|
Neighborhood properties are important determinants of temperature sensitive mutations. PLoS One 2011; 6:e28507. [PMID: 22164302 PMCID: PMC3229608 DOI: 10.1371/journal.pone.0028507] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 11/09/2011] [Indexed: 02/08/2023] Open
Abstract
Temperature-sensitive (TS) mutants are powerful tools to study gene function in vivo. These mutants exhibit wild-type activity at permissive temperatures and reduced activity at restrictive temperatures. Although random mutagenesis can be used to generate TS mutants, the procedure is laborious and unfeasible in multicellular organisms. Further, the underlying molecular mechanisms of the TS phenotype are poorly understood. To elucidate TS mechanisms, we used a machine learning method–logistic regression–to investigate a large number of sequence and structure features. We developed and tested 133 features, describing properties of either the mutation site or the mutation site neighborhood. We defined three types of neighborhood using sequence distance, Euclidean distance, and topological distance. We discovered that neighborhood features outperformed mutation site features in predicting TS mutations. The most predictive features suggest that TS mutations tend to occur at buried and rigid residues, and are located at conserved protein domains. The environment of a buried residue often determines the overall structural stability of a protein, thus may lead to reversible activity change upon temperature switch. We developed TS prediction models based on logistic regression and the Lasso regularized procedure. Through a ten-fold cross-validation, we obtained the area under the curve of 0.91 for the model using both sequence and structure features. Testing on independent datasets suggested that the model predicted TS mutations with a 50% precision. In summary, our study elucidated the molecular basis of TS mutants and suggested the importance of neighborhood properties in determining TS mutations. We further developed models to predict TS mutations derived from single amino acid substitutions. In this way, TS mutants can be efficiently obtained through experimentally introducing the predicted mutations.
Collapse
|
5
|
Fuzzy oil drop model to interpret the structure of antifreeze proteins and their mutants. J Mol Model 2011; 18:229-37. [PMID: 21523554 PMCID: PMC3249532 DOI: 10.1007/s00894-011-1033-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Accepted: 03/07/2011] [Indexed: 12/04/2022]
Abstract
Mutations in proteins introduce structural changes and influence biological activity: the specific effects depend on the location of the mutation. The simple method proposed in the present paper is based on a two-step model of in silico protein folding. The structure of the first intermediate is assumed to be determined solely by backbone conformation. The structure of the second one is assumed to be determined by the presence of a hydrophobic center. The comparable structural analysis of the set of mutants is performed to identify the mutant-induced structural changes. The changes of the hydrophobic core organization measured by the divergence entropy allows quantitative comparison estimating the relative structural changes upon mutation. The set of antifreeze proteins, which appeared to represent the hydrophobic core structure accordant with “fuzzy oil drop” model was selected for analysis.
Collapse
|
6
|
Kelly L, Fukushima H, Karchin R, Gow JM, Chinn LW, Pieper U, Segal MR, Kroetz DL, Sali A. Functional hot spots in human ATP-binding cassette transporter nucleotide binding domains. Protein Sci 2011; 19:2110-21. [PMID: 20799350 DOI: 10.1002/pro.491] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The human ATP-binding cassette (ABC) transporter superfamily consists of 48 integral membrane proteins that couple the action of ATP binding and hydrolysis to the transport of diverse substrates across cellular membranes. Defects in 18 transporters have been implicated in human disease. In hundreds of cases, disease phenotypes and defects in function can be traced to nonsynonymous single nucleotide polymorphisms (nsSNPs). The functional impact of the majority of ABC transporter nsSNPs has yet to be experimentally characterized. Here, we combine experimental mutational studies with sequence and structural analysis to describe the impact of nsSNPs in human ABC transporters. First, the disease associations of 39 nsSNPs in 10 transporters were rationalized by identifying two conserved loops and a small α-helical region that may be involved in interdomain communication necessary for transport of substrates. Second, an approach to discriminate between disease-associated and neutral nsSNPs was developed and tailored to this superfamily. Finally, the functional impact of 40 unannotated nsSNPs in seven ABC transporters identified in 247 ethnically diverse individuals studied by the Pharmacogenetics of Membrane Transporters consortium was predicted. Three predictions were experimentally tested using human embryonic kidney epithelial (HEK) 293 cells stably transfected with the reference multidrug resistance transporter 4 and its variants to examine functional differences in transport of the antiviral drug, tenofovir. The experimental results confirmed two predictions. Our analysis provides a structural and evolutionary framework for rationalizing and predicting the functional effects of nsSNPs in this clinically important membrane transporter superfamily.
Collapse
Affiliation(s)
- Libusha Kelly
- Graduate Group in Bioinformatics, University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Cline MS, Karchin R. Using bioinformatics to predict the functional impact of SNVs. Bioinformatics 2011; 27:441-8. [PMID: 21159622 PMCID: PMC3105482 DOI: 10.1093/bioinformatics/btq695] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 11/21/2010] [Accepted: 12/12/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The past decade has seen the introduction of fast and relatively inexpensive methods to detect genetic variation across the genome and exponential growth in the number of known single nucleotide variants (SNVs). There is increasing interest in bioinformatics approaches to identify variants that are functionally important from millions of candidate variants. Here, we describe the essential components of bioinformatics tools that predict functional SNVs. RESULTS Bioinformatics tools have great potential to identify functional SNVs, but the black box nature of many tools can be a pitfall for researchers. Understanding the underlying methods, assumptions and biases of these tools is essential to their intelligent application.
Collapse
Affiliation(s)
- Melissa S Cline
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, CA, USA
| | | |
Collapse
|
8
|
Mort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, Bagchi A, Peters BJ, Sathyesh R, Li B, Sun Y, Xue B, Shah NH, Kann MG, Cooper DN, Radivojac P, Mooney SD. In silico functional profiling of human disease-associated and polymorphic amino acid substitutions. Hum Mutat 2010; 31:335-46. [PMID: 20052762 DOI: 10.1002/humu.21192] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An important challenge in translational bioinformatics is to understand how genetic variation gives rise to molecular changes at the protein level that can precipitate both monogenic and complex disease. To this end, we compiled datasets of human disease-associated amino acid substitutions (AAS) in the contexts of inherited monogenic disease, complex disease, functional polymorphisms with no known disease association, and somatic mutations in cancer, and compared them with respect to predicted functional sites in proteins. Using the sequence homology-based tool SIFT to estimate the proportion of deleterious AAS in each dataset, only complex disease AAS were found to be indistinguishable from neutral polymorphic AAS. Investigation of monogenic disease AAS predicted to be nondeleterious by SIFT were characterized by a significant enrichment for inherited AAS within solvent accessible residues, regions of intrinsic protein disorder, and an association with the loss or gain of various posttranslational modifications. Sites of structural and/or functional interest were therefore surmised to constitute useful additional features with which to identify the molecular disruptions caused by deleterious AAS. A range of bioinformatic tools, designed to predict structural and functional sites in protein sequences, were then employed to demonstrate that intrinsic biases exist in terms of the distribution of different types of human AAS with respect to specific structural, functional and pathological features. Our Web tool, designed to potentiate the functional profiling of novel AAS, has been made available at http://profile.mutdb.org/.
Collapse
Affiliation(s)
- Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Zhang Z, Teng S, Wang L, Schwartz CE, Alexov E. Computational analysis of missense mutations causing Snyder-Robinson syndrome. Hum Mutat 2010; 31:1043-9. [PMID: 20556796 PMCID: PMC2932761 DOI: 10.1002/humu.21310] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The Snyder-Robinson syndrome is caused by missense mutations in the spermine sythase gene that encodes a protein (SMS) of 529 amino acids. Here we investigate, in silico, the molecular effect of three missense mutations, c.267G>A (p.G56S), c.496T>G (p.V132G), and c.550T>C (p.I150T) in SMS that were clinically identified to cause the disease. Single-point energy calculations, molecular dynamics simulations, and pKa calculations revealed the effects of these mutations on SMS's stability, flexibility, and interactions. It was predicted that the catalytic residue, Asp276, should be protonated prior binding the substrates. The pKa calculations indicated the p.I150T mutation causes pKa changes with respect to the wild-type SMS, which involve titratable residues interacting with the S-methyl-5'-thioadenosine (MTA) substrate. The p.I150T missense mutation was also found to decrease the stability of the C-terminal domain and to induce structural changes in the vicinity of the MTA binding site. The other two missense mutations, p.G56S and p.V132G, are away from active site and do not perturb its wild-type properties, but affect the stability of both the monomers and the dimer. Specifically, the p.G56S mutation is predicted to greatly reduce the affinity of monomers to form a dimer, and therefore should have a dramatic effect on SMS function because dimerization is essential for SMS activity.
Collapse
Affiliation(s)
- Zhe Zhang
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634
| | - Shaolei Teng
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC 29634
| | - Liangjiang Wang
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC 29634
- J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646
| | - Charles E. Schwartz
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC 29634
- J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646
| | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634
| |
Collapse
|
10
|
McClendon CL, Friedland G, Mobley DL, Amirkhani H, Jacobson MP. Quantifying Correlations Between Allosteric Sites in Thermodynamic Ensembles. J Chem Theory Comput 2009; 5:2486-2502. [PMID: 20161451 PMCID: PMC2790287 DOI: 10.1021/ct9001812] [Citation(s) in RCA: 173] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Allostery describes altered protein function at one site due to a perturbation at another site. One mechanism of allostery involves correlated motions, which can occur even in the absence of substantial conformational change. We present a novel method, "MutInf", to identify statistically significant correlated motions from equilibrium molecular dynamics simulations. Our approach analyzes both backbone and sidechain motions using internal coordinates to account for the gear-like twists that can take place even in the absence of the large conformational changes typical of traditional allosteric proteins. We quantify correlated motions using a mutual information metric, which we extend to incorporate data from multiple short simulations and to filter out correlations that are not statistically significant. Applying our approach to uncover mechanisms of cooperative small molecule binding in human interleukin-2, we identify clusters of correlated residues from 50 ns of molecular dynamics simulations. Interestingly, two of the clusters with the strongest correlations highlight known cooperative small-molecule binding sites and show substantial correlations between these sites. These cooperative binding sites on interleukin-2 are correlated not only through the hydrophobic core of the protein but also through a dynamic polar network of hydrogen bonding and electrostatic interactions. Since this approach identifies correlated conformations in an unbiased, statistically robust manner, it should be a useful tool for finding novel or "orphan" allosteric sites in proteins of biological and therapeutic importance.
Collapse
Affiliation(s)
- Christopher L McClendon
- University of California San Francisco, Graduate Group in Biophysics and Department of Pharmaceutical Chemistry
| | | | | | | | | |
Collapse
|
11
|
Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res 2009; 69:6660-7. [PMID: 19654296 DOI: 10.1158/0008-5472.can-09-1133] [Citation(s) in RCA: 331] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Large-scale sequencing of cancer genomes has uncovered thousands of DNA alterations, but the functional relevance of the majority of these mutations to tumorigenesis is unknown. We have developed a computational method, called Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM), to identify and prioritize those missense mutations most likely to generate functional changes that enhance tumor cell proliferation. The method has high sensitivity and specificity when discriminating between known driver missense mutations and randomly generated missense mutations (area under receiver operating characteristic curve, >0.91; area under Precision-Recall curve, >0.79). CHASM substantially outperformed previously described missense mutation function prediction methods at discriminating known oncogenic mutations in P53 and the tyrosine kinase epidermal growth factor receptor. We applied the method to 607 missense mutations found in a recent glioblastoma multiforme sequencing study. Based on a model that assumed the glioblastoma multiforme mutations are a mixture of drivers and passengers, we estimate that 8% of these mutations are drivers, causally contributing to tumorigenesis.
Collapse
Affiliation(s)
- Hannah Carter
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Carvalho M, Pino MA, Karchin R, Beddor J, Godinho-Netto M, Mesquita RD, Rodarte RS, Vaz DC, Monteiro VA, Manoukian S, Colombo M, Ripamonti CB, Rosenquist R, Suthers G, Borg A, Radice P, Grist SA, Monteiro ANA, Billack B. Analysis of a set of missense, frameshift, and in-frame deletion variants of BRCA1. Mutat Res 2008; 660:1-11. [PMID: 18992264 DOI: 10.1016/j.mrfmmm.2008.09.017] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Revised: 08/12/2008] [Accepted: 09/27/2008] [Indexed: 12/19/2022]
Abstract
Germline mutations that inactivate BRCA1 are responsible for breast and ovarian cancer susceptibility. One possible outcome of genetic testing for BRCA1 is the finding of a genetic variant of uncertain significance for which there is no information regarding its cancer association. This outcome leads to problems in risk assessment, counseling and preventive care. The purpose of the present study was to functionally evaluate seven unclassified variants of BRCA1 including a genomic deletion that leads to the in-frame loss of exons 16/17 (Delta exons 16/17) in the mRNA, an insertion that leads to a frameshift and an extended carboxy-terminus (5673insC), and five missense variants (K1487R, S1613C, M1652I, Q1826H and V1833M). We analyzed the variants using a functional assay based on the transcription activation property of BRCA1 combined with supervised learning computational models. Functional analysis indicated that variants S1613C, Q1826H, and M1652I are likely to be neutral, whereas variants V1833M, Delta exons 16/17, and 5673insC are likely to represent deleterious variants. In agreement with the functional analysis, the results of the computational analysis also indicated that the latter three variants are likely to be deleterious. Taken together, a combined approach of functional and bioinformatics analysis, plus structural modeling, can be utilized to obtain valuable information pertaining to the effect of a rare variant on the structure and function of BRCA1. Such information can, in turn, aid in the classification of BRCA1 variants for which there is a lack of genetic information needed to provide reliable risk assessment.
Collapse
Affiliation(s)
- Marcelo Carvalho
- H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
George Priya Doss C, Rajasekaran R, Sudandiradoss C, Ramanathan K, Purohit R, Sethumadhavan R. A novel computational and structural analysis of nsSNPs in CFTR gene. Genomic Med 2008; 2:23-32. [PMID: 18716917 DOI: 10.1007/s11568-008-9019-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2008] [Accepted: 04/25/2008] [Indexed: 11/24/2022] Open
Abstract
Single Nucleotide Polymorphisms (SNPs) are being intensively studied to understand the biological basis of complex traits and diseases. The Genetics of human phenotype variation could be understood by knowing the functions of SNPs. In this study using computational methods, we analyzed the genetic variations that can alter the expression and function of the CFTR gene responsible candidate for causing cystic fibrosis. We applied an evolutionary perspective to screen the SNPs using a sequence homology-based SIFT tool, which suggested that 17 nsSNPs (44%) were found to be deleterious. The structure-based approach PolyPhen server suggested that 26 nsSNPS (66%) may disrupt protein function and structure. The PupaSuite tool predicted the phenotypic effect of SNPs on the structure and function of the affected protein. Structure analysis was carried out with the major mutation that occurred in the native protein coded by CFTR gene, and which is at amino acid position F508C for nsSNP with id (rs1800093). The amino acid residues in the native and mutant modeled protein were further analyzed for solvent accessibility, secondary structure and stabilizing residues to check the stability of the proteins. The SNPs were further subjected to iHAP analysis to identify htSNPs, and we report potential candidates for future studies on CFTR mutations.
Collapse
Affiliation(s)
- C George Priya Doss
- Bioinformatics Division, School of Biotechnology, Chemical and Biomedical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | | | | | | | | | | |
Collapse
|
14
|
Karchin R, Monteiro ANA, Tavtigian SV, Carvalho MA, Sali A. Functional impact of missense variants in BRCA1 predicted by supervised learning. PLoS Comput Biol 2006; 3:e26. [PMID: 17305420 PMCID: PMC1797820 DOI: 10.1371/journal.pcbi.0030026] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Accepted: 12/27/2006] [Indexed: 11/19/2022] Open
Abstract
Many individuals tested for inherited cancer susceptibility at the BRCA1 gene locus are discovered to have variants of unknown clinical significance (UCVs). Most UCVs cause a single amino acid residue (missense) change in the BRCA1 protein. They can be biochemically assayed, but such evaluations are time-consuming and labor-intensive. Computational methods that classify and suggest explanations for UCV impact on protein function can complement functional tests. Here we describe a supervised learning approach to classification of BRCA1 UCVs. Using a novel combination of 16 predictive features, the algorithms were applied to retrospectively classify the impact of 36 BRCA1 C-terminal (BRCT) domain UCVs biochemically assayed to measure transactivation function and to blindly classify 54 documented UCVs. Majority vote of three supervised learning algorithms is in agreement with the assay for more than 94% of the UCVs. Two UCVs found deleterious by both the assay and the classifiers reveal a previously uncharacterized putative binding site. Clinicians may soon be able to use computational classifiers such as those described here to better inform patients. These classifiers can be adapted to other cancer susceptibility genes and systematically applied to prioritize the growing number of potential causative loci and variants found by large-scale disease association studies.
Collapse
Affiliation(s)
- Rachel Karchin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute of Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- * To whom correspondence should be addressed. E-mail: (RK); (AS)
| | - Alvaro N. A Monteiro
- Risk Assessment, Detection, and Intervention Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, United States of America
| | | | - Marcelo A Carvalho
- Risk Assessment, Detection, and Intervention Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, United States of America
| | - Andrej Sali
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biomedical Research, University of California San Francisco, San Francisco, California, United States of America
- * To whom correspondence should be addressed. E-mail: (RK); (AS)
| |
Collapse
|
15
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
16
|
Yue P, Melamud E, Moult J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 2006; 7:166. [PMID: 16551372 PMCID: PMC1435944 DOI: 10.1186/1471-2105-7-166] [Citation(s) in RCA: 316] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2005] [Accepted: 03/22/2006] [Indexed: 11/25/2022] Open
Abstract
Background The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level. Description The resource has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension. Conclusion The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction.
Collapse
Affiliation(s)
- Peng Yue
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
- Molecular and cellular Biology Program, University of Maryland, College Park, MD 20742, USA
| | - Eugene Melamud
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
- Molecular and cellular Biology Program, University of Maryland, College Park, MD 20742, USA
| | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
| |
Collapse
|