Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Siddiqui AS, Barton GJ. Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci 1995;4:872-84. [PMID: 7663343 PMCID: PMC2143117 DOI: 10.1002/pro.5560040507] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

For:	Siddiqui AS, Barton GJ. Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci 1995;4:872-84. [PMID: 7663343 PMCID: PMC2143117 DOI: 10.1002/pro.5560040507] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Number

Cited by Other Article(s)

Lau AM, Kandathil SM, Jones DT. Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nat Commun 2023;14:8445. [PMID: 38114456 PMCID: PMC10730818 DOI: 10.1038/s41467-023-43934-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open

Sidhanta SPD, Sowdhamini R, Srinivasan N. Comparative analysis of permanent and transient domain-domain interactions in multi-domain proteins. Proteins 2023. [PMID: 37828826 DOI: 10.1002/prot.26581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 10/14/2023]

Taheri-Ledari M, Zandieh A, Shariatpanahi SP, Eslahchi C. Assignment of structural domains in proteins using diffusion kernels on graphs. BMC Bioinformatics 2022;23:369. [PMID: 36076174 PMCID: PMC9461149 DOI: 10.1186/s12859-022-04902-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 08/23/2022] [Indexed: 11/10/2022] Open

Abstract

Though proposing algorithmic approaches for protein domain decomposition has been of high interest, the inherent ambiguity to the problem makes it still an active area of research. Besides, accurate automated methods are in high demand as the number of solved structures for complex proteins is on the rise. While majority of the previous efforts for decomposition of 3D structures are centered on the developing clustering algorithms, employing enhanced measures of proximity between the amino acids has remained rather uncharted. If there exists a kernel function that in its reproducing kernel Hilbert space, structural domains of proteins become well separated, then protein structures can be parsed into domains without the need to use a complex clustering algorithm. Inspired by this idea, we developed a protein domain decomposition method based on diffusion kernels on protein graphs. We examined all combinations of four graph node kernels and two clustering algorithms to investigate their capability to decompose protein structures. The proposed method is tested on five of the most commonly used benchmark datasets for protein domain assignment plus a comprehensive non-redundant dataset. The results show a competitive performance of the method utilizing one of the diffusion kernels compared to four of the best automatic methods. Our method is also able to offer alternative partitionings for the same structure which is in line with the subjective definition of protein domain. With a competitive accuracy and balanced performance for the simple and complex structures despite relying on a relatively naive criterion to choose optimal decomposition, the proposed method revealed that diffusion kernels on graphs in particular, and kernel functions in general are promising measures to facilitate parsing proteins into domains and performing different structural analysis on proteins. The size and interconnectedness of the protein graphs make them promising targets for diffusion kernels as measures of affinity between amino acids. The versatility of our method allows the implementation of future kernels with higher performance. The source code of the proposed method is accessible at https://github.com/taherimo/kludo . Also, the proposed method is available as a web application from https://cbph.ir/tools/kludo .

Collapse

Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021;433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]

Abstract

Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.

Collapse

de Oliveira S, Deane C. Co-evolution techniques are reshaping the way we do structural bioinformatics. F1000Res 2017;6:1224. [PMID: 28781768 PMCID: PMC5531156 DOI: 10.12688/f1000research.11543.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/24/2017] [Indexed: 11/20/2022] Open

Postic G, Ghouzam Y, Chebrek R, Gelly JC. An ambiguity principle for assigning protein structural domains. SCIENCE ADVANCES 2017;3:e1600552. [PMID: 28097215 PMCID: PMC5235333 DOI: 10.1126/sciadv.1600552] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 11/28/2016] [Indexed: 05/20/2023]

CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. Methods Mol Biol 2017;1558:79-110. [PMID: 28150234 DOI: 10.1007/978-1-4939-6783-4_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Xue Z, Jang R, Govindarajoo B, Huang Y, Wang Y. Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains. PLoS One 2015;10:e0141541. [PMID: 26502173 PMCID: PMC4621036 DOI: 10.1371/journal.pone.0141541] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/10/2015] [Indexed: 11/18/2022] Open

The history of the CATH structural classification of protein domains. Biochimie 2015;119:209-17. [PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/21/2022]

Wieninger SA, Ullmann GM. CoMoDo: Identifying Dynamic Protein Domains Based on Covariances of Motion. J Chem Theory Comput 2015;11:2841-54. [DOI: 10.1021/acs.jctc.5b00150] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Ansari ES, Eslahchi C, Pezeshk H, Sadeghi M. ProDomAs, protein domain assignment algorithm using center-based clustering and independent dominating set. Proteins 2014;82:1937-46. [PMID: 24596179 DOI: 10.1002/prot.24547] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 02/09/2014] [Accepted: 02/20/2014] [Indexed: 11/07/2022]

Skorupka K, Han SK, Nam HJ, Kim S, Faham S. Protein design by fusion: implications for protein structure prediction and evolution. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013;69:2451-60. [PMID: 24311586 DOI: 10.1107/s0907444913022701] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 08/12/2013] [Indexed: 01/21/2023]

Seo S, Jang Y, Qian P, Liu WK, Choi JB, Lim BS, Kim MK. Efficient prediction of protein conformational pathways based on the hybrid elastic network model. J Mol Graph Model 2013;47:25-36. [PMID: 24296313 DOI: 10.1016/j.jmgm.2013.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2013] [Revised: 10/19/2013] [Accepted: 10/22/2013] [Indexed: 11/18/2022]

Arab SS, Gharamaleki MP, Pashandi Z, Mobasseri R. Putracer: a novel method for identification of continuous-domains in multi-domain proteins. J Bioinform Comput Biol 2013;11:1340012. [PMID: 23427994 DOI: 10.1142/s021972001340012x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ebina T, Umezawa Y, Kuroda Y. IS-Dom: a dataset of independent structural domains automatically delineated from protein structures. J Comput Aided Mol Des 2013;27:419-26. [PMID: 23715893 DOI: 10.1007/s10822-013-9654-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Accepted: 05/07/2013] [Indexed: 11/25/2022]

Gomes M, Hamer R, Reinert G, Deane CM. Mutual information and variants for protein domain-domain contact prediction. BMC Res Notes 2012;5:472. [PMID: 23244412 PMCID: PMC3532072 DOI: 10.1186/1756-0500-5-472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/10/2012] [Indexed: 01/20/2023] Open

Abstract

BACKGROUND

Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein).

METHODS

Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively.

RESULTS

We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific "successful" case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random.

CONCLUSIONS

All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.

Collapse

Genoni A, Morra G, Colombo G. Identification of domains in protein structures from the analysis of intramolecular interactions. J Phys Chem B 2012;116:3331-43. [PMID: 22384792 DOI: 10.1021/jp210568a] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Andreeva A. Classification of proteins: available structural space for molecular modeling. Methods Mol Biol 2012;857:1-31. [PMID: 22323215 DOI: 10.1007/978-1-61779-588-6_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Flores SC, Gerstein MB. Predicting protein ligand binding motions with the conformation explorer. BMC Bioinformatics 2011;12:417. [PMID: 22032721 PMCID: PMC3354956 DOI: 10.1186/1471-2105-12-417] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 10/27/2011] [Indexed: 11/26/2022] Open

Hamer R, Luo Q, Armitage JP, Reinert G, Deane CM. i-Patch: interprotein contact prediction using local network information. Proteins 2011;78:2781-97. [PMID: 20635422 DOI: 10.1002/prot.22792] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Esque J, Oguey C, de Brevern AG. Comparative Analysis of Threshold and Tessellation Methods for Determining Protein Contacts. J Chem Inf Model 2011;51:493-507. [DOI: 10.1021/ci100195t] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Tai CH, Sam V, Gibrat JF, Garnier J, Munson PJ, Lee B. Protein domain assignment from the recurrence of locally similar structures. Proteins 2010;79:853-66. [PMID: 21287617 DOI: 10.1002/prot.22923] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Revised: 10/14/2010] [Accepted: 10/18/2010] [Indexed: 11/10/2022]

He Z, Zhao Y, Mei G, Li N, Chen Y. Could protein tertiary structure influence mammary transgene expression more than tissue specific codon usage? Transgenic Res 2010;19:519-33. [PMID: 20563642 PMCID: PMC2902731 DOI: 10.1007/s11248-010-9411-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 05/19/2010] [Indexed: 12/03/2022]

Keating KS, Flores SC, Gerstein MB, Kuhn LA. StoneHinge: hinge prediction by network analysis of individual protein structures. Protein Sci 2009;18:359-71. [PMID: 19180449 DOI: 10.1002/pro.38] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Bruylants G, Redfield C. (15)N NMR relaxation data reveal significant chemical exchange broadening in the alpha-domain of human alpha-lactalbumin. Biochemistry 2009;48:4031-9. [PMID: 19309110 DOI: 10.1021/bi900023m] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009;91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]

Shi S, Pei J, Sadreyev RI, Kinch LN, Majumdar I, Tong J, Cheng H, Kim BH, Grishin NV. Analysis of CASP8 targets, predictions and assessment methods. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2009;2009:bap003. [PMID: 20157476 PMCID: PMC2794793 DOI: 10.1093/database/bap003] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 02/21/2009] [Indexed: 11/17/2022]

Majumdar I, Kinch LN, Grishin NV. A database of domain definitions for proteins with complex interdomain geometry. PLoS One 2009;4:e5084. [PMID: 19352501 PMCID: PMC2662426 DOI: 10.1371/journal.pone.0005084] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Accepted: 03/10/2009] [Indexed: 11/18/2022] Open

Liu Q, Huang J, Liu H, Wan P, Ye X, Xu Y. Analyses of domains and domain fusions in human proto-oncogenes. BMC Bioinformatics 2009;10:88. [PMID: 19292927 PMCID: PMC2679021 DOI: 10.1186/1471-2105-10-88] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Accepted: 03/17/2009] [Indexed: 11/18/2022] Open

Sikder AR, Zomaya AY. Inferring boundary information of discontinuous-domain proteins. IEEE Trans Nanobioscience 2008;7:200-5. [PMID: 18779100 DOI: 10.1109/tnb.2008.2002283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008;90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]

Xu D, Xu Y. Protein databases on the internet. ACTA ACUST UNITED AC 2008;Chapter 19:Unit 19.4. [PMID: 18265344 DOI: 10.1002/0471142727.mb1904s68] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Redfern OC, Harrison A, Dallman T, Pearl FMG, Orengo CA. CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 2008;3:e232. [PMID: 18052539 PMCID: PMC2098860 DOI: 10.1371/journal.pcbi.0030232] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2007] [Accepted: 10/11/2007] [Indexed: 11/19/2022] Open

Abstract

We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.

Proteins comprise individual folding units known as domains, with a significant proportion containing two or more (multidomain structures). Each domain is thought to represent a unit of evolution and adopts a specific fold. Detecting domains is often the first step in classifying proteins into evolutionary families for studying the relationship between sequence, structure, and function. Automatically identifying domains from structural data is problematic due to the fact that domains vary substantially in their compactness and geometric separation from one another in the whole protein. We present a novel method, CATHEDRAL, which iteratively identifies each domain by comparing a query structure against a library of manually verified domains in the CATH domain database through computational structure comparison. We find that CATHEDRAL is able to outperform the majority of popular structure comparison methods for finding structural relatives. Furthermore, it is able to accurately identify domain boundaries and outperform other methods of structure-based domain prediction for the majority of proteins. CATHEDRAL is available as a Webserver to provide domain annotations for the community and hence aid in structural and functional characterisation of newly solved protein structures.

Collapse

Ingolfsson H, Yona G. Protein domain prediction. Methods Mol Biol 2008;426:117-143. [PMID: 18542860 DOI: 10.1007/978-1-60327-058-8_7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Russell RB. Classification of protein folds. Mol Biotechnol 2007;36:238-47. [PMID: 17873410 DOI: 10.1007/s12033-007-0032-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/1999] [Revised: 11/30/1999] [Accepted: 11/30/1999] [Indexed: 11/26/2022]

Emmert-Streib F, Mushegian A. A topological algorithm for identification of structural domains of proteins. BMC Bioinformatics 2007;8:237. [PMID: 17608939 PMCID: PMC1933582 DOI: 10.1186/1471-2105-8-237] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 07/03/2007] [Indexed: 11/10/2022] Open

Abstract

Background

Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure.

Results

We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy.

Conclusion

Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.

Collapse

Zhou H, Xue B, Zhou Y. DDOMAIN: Dividing structures into domains using a normalized domain-domain interaction profile. Protein Sci 2007;16:947-55. [PMID: 17456745 PMCID: PMC2206635 DOI: 10.1110/ps.062597307] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Abstract

Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.

Collapse

FlexOracle: predicting flexible hinges by identification of stable domains. BMC Bioinformatics 2007;8:215. [PMID: 17587456 PMCID: PMC1933439 DOI: 10.1186/1471-2105-8-215] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 06/22/2007] [Indexed: 11/28/2022] Open

Hinge Atlas: relating protein sequence to sites of structural flexibility. BMC Bioinformatics 2007;8:167. [PMID: 17519025 PMCID: PMC1913541 DOI: 10.1186/1471-2105-8-167] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2006] [Accepted: 05/22/2007] [Indexed: 12/03/2022] Open

Abstract

Background

Relating features of protein sequences to structural hinges is important for identifying domain boundaries, understanding structure-function relationships, and designing flexibility into proteins. Efforts in this field have been hampered by the lack of a proper dataset for studying characteristics of hinges.

Results

Using the Molecular Motions Database we have created a Hinge Atlas of manually annotated hinges and a statistical formalism for calculating the enrichment of various types of residues in these hinges.

Conclusion

We found various correlations between hinges and sequence features. Some of these are expected; for instance, we found that hinges tend to occur on the surface and in coils and turns and to be enriched with small and hydrophilic residues. Others are less obvious and intuitive. In particular, we found that hinges tend to coincide with active sites, but unlike the latter they are not at all conserved in evolution. We evaluate the potential for hinge prediction based on sequence.

Motions play an important role in catalysis and protein-ligand interactions. Hinge bending motions comprise the largest class of known motions. Therefore it is important to relate the hinge location to sequence features such as residue type, physicochemical class, secondary structure, solvent exposure, evolutionary conservation, and proximity to active sites. To do this, we first generated the Hinge Atlas, a set of protein motions with the hinge locations manually annotated, and then studied the coincidence of these features with the hinge location. We found that all of the features have bearing on the hinge location. Most interestingly, we found that hinges tend to occur at or near active sites and yet unlike the latter are not conserved. Less surprisingly, we found that hinge residues tend to be small, not hydrophobic or aliphatic, and occur in turns and random coils on the surface. A functional sequence based hinge predictor was made which uses some of the data generated in this study. The Hinge Atlas is made available to the community for further flexibility studies.

Collapse

Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2006;35:D291-7. [PMID: 17135200 PMCID: PMC1751535 DOI: 10.1093/nar/gkl959] [Citation(s) in RCA: 239] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Tanaka T, Yokoyama S, Kuroda Y. Improvement of domain linker prediction by incorporating loop-length-dependent characteristics. Biopolymers 2006;84:161-8. [PMID: 16134173 DOI: 10.1002/bip.20361] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Kundu S, Sorensen DC, Phillips GN. Automatic domain decomposition of proteins by a Gaussian Network Model. Proteins 2006;57:725-33. [PMID: 15478120 DOI: 10.1002/prot.20268] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Sistla RK, K V B, Vishveshwara S. Identification of domains and domain interface residues in multidomain proteins from graph spectral method. Proteins 2006;59:616-26. [PMID: 15789418 DOI: 10.1002/prot.20444] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Galzitskaya OV, Dovidchenko NV, Lobanov MY, Garbuzynskiy SO. Prediction of protein domain boundaries from statistics of appearance of amino acid residues. Mol Biol 2006. [DOI: 10.1134/s0026893306010146] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Sapranauskas R, Lubys A. Random gene dissection: a tool for the investigation of protein structural organization. Biotechniques 2005;39:395-402. [PMID: 16206911 DOI: 10.2144/05393rr01] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Simon K, Xu J, Kim C, Skrynnikov NR. Estimating the accuracy of protein structures using residual dipolar couplings. JOURNAL OF BIOMOLECULAR NMR 2005;33:83-93. [PMID: 16258827 DOI: 10.1007/s10858-005-2601-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2005] [Accepted: 08/05/2005] [Indexed: 05/05/2023]

Dumontier M, Yao R, Feldman HJ, Hogue CWV. Armadillo: domain boundary prediction by amino acid composition. J Mol Biol 2005;350:1061-73. [PMID: 15978619 DOI: 10.1016/j.jmb.2005.05.037] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2004] [Revised: 05/16/2005] [Accepted: 05/18/2005] [Indexed: 11/25/2022]

Abstract

The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.

Collapse

Orengo CA, Thornton JM. PROTEIN FAMILIES AND THEIR EVOLUTION—A STRUCTURAL PERSPECTIVE. Annu Rev Biochem 2005;74:867-900. [PMID: 15954844 DOI: 10.1146/annurev.biochem.74.082803.133029] [Citation(s) in RCA: 214] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Bae K, Mallick BK, Elsik CG. Prediction of protein interdomain linker regions by a hidden Markov model. Bioinformatics 2005;21:2264-70. [PMID: 15746283 DOI: 10.1093/bioinformatics/bti363] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Kovacs JA, Chacón P, Abagyan R. Predictions of protein flexibility: first-order measures. Proteins 2004;56:661-8. [PMID: 15281119 DOI: 10.1002/prot.20151] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]