1
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
2
|
Tang H, Thomas PD. Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 2016; 203:635-47. [PMID: 27270698 PMCID: PMC4896183 DOI: 10.1534/genetics.116.190033] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 04/01/2016] [Indexed: 01/09/2023] Open
Abstract
As personal genome sequencing becomes a reality, understanding the effects of genetic variants on phenotype-particularly the impact of germline variants on disease risk and the impact of somatic variants on cancer development and treatment-continues to increase in importance. Because of their clear potential for affecting phenotype, nonsynonymous genetic variants (variants that cause a change in the amino acid sequence of a protein encoded by a gene) have long been the target of efforts to predict the effects of genetic variation. Whole-genome sequencing is identifying large numbers of nonsynonymous variants in each genome, intensifying the need for computational methods that accurately predict which of these are likely to impact disease phenotypes. This review focuses on nonsynonymous variant prediction with two aims in mind: (1) to review the prioritization methods that have been developed to date and the principles on which they are based and (2) to discuss the challenges to further improving these methods.
Collapse
Affiliation(s)
- Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90033
| |
Collapse
|
3
|
Alternative approach to protein structure prediction based on sequential similarity of physical properties. Proc Natl Acad Sci U S A 2015; 112:5029-32. [PMID: 25848034 DOI: 10.1073/pnas.1504806112] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.
Collapse
|
4
|
Chu Y, Li W, Wang J, Liu G, Tang Y. Computational insights into the binding modes of Sr-Rex with cofactor NADH/NAD+ and operator DNA. J Mol Model 2013; 19:3143-51. [PMID: 23615679 DOI: 10.1007/s00894-013-1848-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 04/04/2013] [Indexed: 10/26/2022]
Abstract
The transcriptional repressor Rex plays key roles in modulating respiratory gene expression. It senses the redox poise of the NAD(H) pool. Rex from Streptomyces rimosus (Sr-Rex) is a newly identified protein. Its structure and complex with substrates are not determined yet. In this study, the three-dimensional (3D) structural models of Sr-Rex dimer and its complex with cofactors were constructed by homology modeling. The stability of the constructed Sr-Rex models and the detailed interactions between Sr-Rex and cofactors were further investigated by molecular dynamics simulations. The results demonstrated that the conformation of Sr-Rex changed a lot when binding with the reduced NADH or oxidized NAD(+). Once binding with NADH, the Sr-Rex dimer displayed an opener conformation, which would weaken the interaction of Sr-Rex with Rex operator DNA (ROP). Key residues responsible for the binding were then identified. The computational results were consistent with experimental results, and hence provided insights into the molecular mechanism of Sr-Rex binding with ROP and NADH/NAD(+), which might be helpful for the development of biosensor.
Collapse
Affiliation(s)
- Yanyan Chu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | | | | | | | | |
Collapse
|
5
|
Schein CH, Bowen DM, Lewis JA, Choi K, Paul A, van der Heden van Noort GJ, Lu W, Filippov DV. Physicochemical property consensus sequences for functional analysis, design of multivalent antigens and targeted antivirals. BMC Bioinformatics 2012; 13 Suppl 13:S9. [PMID: 23320474 PMCID: PMC3426803 DOI: 10.1186/1471-2105-13-s13-s9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Background Analysis of large sets of biological sequence data from related strains or organisms is complicated by superficial redundancy in the set, which may contain many members that are identical except at one or two positions. Thus a new method, based on deriving physicochemical property (PCP)-consensus sequences, was tested for its ability to generate reference sequences and distinguish functionally significant changes from background variability. Methods The PCP consensus program was used to automatically derive consensus sequences starting from sequence alignments of proteins from Flaviviruses (from the Flavitrack database) and human enteroviruses, using a five dimensional set of Eigenvectors that summarize over 200 different scalar values for the PCPs of the amino acids. A PCP-consensus protein of a Dengue virus envelope protein was produced recombinantly and tested for its ability to bind antibodies to strains using ELISA. Results PCP-consensus sequences of the flavivirus family could be used to classify them into five discrete groups and distinguish areas of the envelope proteins that correlate with host specificity and disease type. A multivalent Dengue virus antigen was designed and shown to bind antibodies against all four DENV types. A consensus enteroviral VPg protein had the same distinctive high pKa as wild type proteins and was recognized by two different polymerases. Conclusions The process for deriving PCP-consensus sequences for any group of aligned similar sequences, has been validated for sequences with up to 50% diversity. Ongoing projects have shown that the method identifies residues that significantly alter PCPs at a given position, and might thus cause changes in function or immunogenicity. Other potential applications include deriving target proteins for drug design and diagnostic kits.
Collapse
Affiliation(s)
- Catherine H Schein
- Institute for Translational Sciences, Computational Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Texas 77555-0857, USA.
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Bowen DM, Lewis JA, Lu W, Schein CH. Simplifying complex sequence information: a PCP-consensus protein binds antibodies against all four Dengue serotypes. Vaccine 2012; 30:6081-7. [PMID: 22863657 DOI: 10.1016/j.vaccine.2012.07.042] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Revised: 07/13/2012] [Accepted: 07/18/2012] [Indexed: 12/15/2022]
Abstract
Designing proteins that reflect the natural variability of a pathogen is essential for developing novel vaccines and drugs. Flaviviruses, including Dengue (DENV) and West Nile (WNV), evolve rapidly and can "escape" neutralizing monoclonal antibodies by mutation. Designing antigens that represent many distinct strains is important for DENV, where infection with a strain from one of the four serotypes may lead to severe hemorrhagic disease on subsequent infection with a strain from another serotype. Here, a DENV physicochemical property (PCP)-consensus sequence was derived from 671 unique sequences from the Flavitrack database. PCP-consensus proteins for domain 3 of the envelope protein (EdomIII) were expressed from synthetic genes in Escherichia coli. The ability of the purified consensus proteins to bind polyclonal antibodies generated in response to infection with strains from each of the four DENV serotypes was determined. The initial consensus protein bound antibodies from DENV-1-3 in ELISA and Western blot assays. This sequence was altered in 3 steps to incorporate regions of maximum variability, identified as significant changes in the PCPs, characteristic of DENV-4 strains. The final protein was recognized by antibodies against all four serotypes. Two amino acids essential for efficient binding to all DENV antibodies are part of a discontinuous epitope previously defined for a neutralizing monoclonal antibody. The PCP-consensus method can significantly reduce the number of experiments required to define a multivalent antigen, which is particularly important when dealing with pathogens that must be tested at higher biosafety levels.
Collapse
Affiliation(s)
- David M Bowen
- Computational Biology, Sealy Center for Structural Biology and Molecular Biophysics, Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555-0857, United States
| | | | | | | |
Collapse
|
7
|
Krishnadev O, Srinivasan N. AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics 2011; 12:275. [PMID: 21729312 PMCID: PMC3228556 DOI: 10.1186/1471-2105-12-275] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 07/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection. Results We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments. Conclusions A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/
Collapse
Affiliation(s)
- Oruganty Krishnadev
- Molecular Biophysics Unit Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
8
|
Hu H, Roqueiro D, Dai Y. Prioritizing predicted cis-regulatory elements for co-expressed gene sets based on Lasso regression models. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2011:6853-6856. [PMID: 22255913 DOI: 10.1109/iembs.2011.6091690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Computational prediction of cis-regulatory elements for a set of co-expressed genes based on sequence analysis provides an overwhelming volume of potential transcription factor binding sites. It presents a challenge to prioritize transcription factors for regulatory functional studies. A novel approach based on the use of Lasso regression models is proposed to address this problem. We examine the ability of the Lasso model using time-course microarray data obtained from a comprehensive study of gene expression profiles in skin and mucosal wounds in mouse over all stages of wound healing.
Collapse
Affiliation(s)
- Hong Hu
- Department of Bioengineering (M/C 063), University of Illinois at Chicago, 851 S Morgan St, SEO 218, Chicago, IL 60607, USA.
| | | | | |
Collapse
|
9
|
Abstract
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
| | | | | |
Collapse
|
10
|
Abstract
BACKGROUND Pattern matching is the core of bioinformatics; it is used in database searching, restriction enzyme mapping, and finding open reading frames. It is done repeatedly over increasingly long sequences, thus codes must be efficient and insensitive to sequence length. Such patterns of interest include simple motifs with IUPAC degeneracies, regular expressions, patterns allowing mismatches, and probability matrices. RESULTS I describe a small application which allows searching for all the above pattern types individually, which further allows these atomic motifs to be assembled into logical rules for more sophisticated analysis. CONCLUSION tacg is small, portable, faster and more capable than most alternatives, relatively easy to modify, and freely available in source code.
Collapse
|
11
|
Hofmann K, Tomiuk S, Wolff G, Stoffel W. Cloning and characterization of the mammalian brain-specific, Mg2+-dependent neutral sphingomyelinase. Proc Natl Acad Sci U S A 2000; 97:5895-900. [PMID: 10823942 PMCID: PMC18530 DOI: 10.1073/pnas.97.11.5895] [Citation(s) in RCA: 241] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The enzymatic breakdown of sphingomyelin by sphingomyelinases is considered the major source of the second messenger ceramide. Studies on the contribution of the various described acidic and neutral sphingomyelinases to the signaling pool of ceramide have been hampered by the lack of molecular data on the neutral sphingomyelinases (nSMases). We recently identified a mammalian nSMase, an integral membrane protein with remote similarity to bacterial sphingomyelinases. However, its ubiquitous expression pattern is in contrast to previous findings that sphingomyelinase activity is found mainly in brain tissues. By using an improved database search method, combined with phylogenetic analysis, we identified a second mammalian nSMase (nSMase2) with predominant expression in the brain. The sphingomyelinase activity of nSMase2 has a neutral pH optimum, depends on Mg(2+) ions, and is activated by unsaturated fatty acids and phosphatidylserine. Immunofluorescence reveals a neuron-specific punctate perinuclear staining, which colocalizes with a Golgi marker in a number of cell lines. The likely identity of nSMase2 with cca1, a rat protein involved in contact inhibition of 3Y1 fibroblasts, suggests a role for this enzyme in cell cycle arrest. Both mammalian nSMases are members of a superfamily of Mg(2+)-dependent phosphohydrolases, which also contains nucleases, inositol phosphatases, and bacterial toxins.
Collapse
Affiliation(s)
- K Hofmann
- Bioinformatics and Gene Discovery Group, MEMOREC Stoffel GmbH, D-50829 Cologne, Germany
| | | | | | | |
Collapse
|
12
|
Ghosh D. OOTFD (Object-Oriented Transcription Factors Database): an object-oriented successor to TFD. Nucleic Acids Res 1998; 26:360-2. [PMID: 9399874 PMCID: PMC147249 DOI: 10.1093/nar/26.1.360] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
ooTFD (object-oriented Transcription Factors Database) is a successor to TFD (Transcription Factors Database). ooTFD contains information represented in TFD but also allows the representation of containment, composite, and interaction relationships between transcription factor polypeptides. ooTFD is designed to represent information about all transcription factors, both eukaryotic and prokaryotic, basal as well as regulatory factors, and multiprotein complexes as well as monomers. ooTFD and associated tools and services can be accessed at http://www.isbi.net/
Collapse
Affiliation(s)
- D Ghosh
- Institute for Transcriptional Informatics, PO Box 2556, Pittsburgh, PA 15230, USA.
| |
Collapse
|
13
|
Neuwald AF, Liu JS, Lipman DJ, Lawrence CE. Extracting protein alignment models from the sequence database. Nucleic Acids Res 1997; 25:1665-77. [PMID: 9108146 PMCID: PMC146639 DOI: 10.1093/nar/25.9.1665] [Citation(s) in RCA: 180] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.
Collapse
Affiliation(s)
- A F Neuwald
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|