1
|
Al Nasr K, Yousef F, Jebril R, Jones C. Analytical Approaches to Improve Accuracy in Solving the Protein Topology Problem. Molecules 2018; 23:E28. [PMID: 29360779 PMCID: PMC6017786 DOI: 10.3390/molecules23020028] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/19/2018] [Accepted: 01/19/2018] [Indexed: 11/17/2022] Open
Abstract
To take advantage of recent advances in genomics and proteomics it is critical that the three-dimensional physical structure of biological macromolecules be determined. Cryo-Electron Microscopy (cryo-EM) is a promising and improving method for obtaining this data, however resolution is often not sufficient to directly determine the atomic scale structure. Despite this, information for secondary structure locations is detectable. De novo modeling is a computational approach to modeling these macromolecular structures based on cryo-EM derived data. During de novo modeling a mapping between detected secondary structures and the underlying amino acid sequence must be identified. DP-TOSS (Dynamic Programming for determining the Topology Of Secondary Structures) is one tool that attempts to automate the creation of this mapping. By treating the correspondence between the detected structures and the structures predicted from sequence data as a constraint graph problem DP-TOSS achieved good accuracy in its original iteration. In this paper, we propose modifications to the scoring methodology of DP-TOSS to improve its accuracy. Three scoring schemes were applied to DP-TOSS and tested: (i) a skeleton-based scoring function; (ii) a geometry-based analytical function; and (iii) a multi-well potential energy-based function. A test of 25 proteins shows that a combination of these schemes can improve the performance of DP-TOSS to solve the topology determination problem for macromolecule proteins.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Feras Yousef
- Department of Mathematics, The University of Jordan, Amman 11942, Jordan.
| | - Ruba Jebril
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Christopher Jones
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| |
Collapse
|
2
|
Al Nasr K, Jones C, Yousef F, Jebril R. PEM-fitter: A Coarse-Grained Method to Validate Protein Candidate Models. J Comput Biol 2017; 25:21-32. [PMID: 29140718 DOI: 10.1089/cmb.2017.0191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The volumetric images produced by Cryo-Electron Microscopy (cryo-EM) technique are used to model macromolecular assemblies and machines. De novo protein modeling uses these images to computationally model the structure of the molecules. Many candidate conformations are usually generated during the intermediate step. Conventionally, each of these candidates is evaluated by time-consuming approaches such as potential energy. We introduce an initial version of a geometrical screening method that uses the skeleton of the cryo-EM images to evaluate candidate structures. The aim of this method is to reduce the number of native-like candidate conformations and, therefore, reduce the time required for structural evaluation by energy calculations. A test of two datasets was performed. The first dataset contains 10 proteins and shows that our method can successfully detect the correct native structure for the given skeleton among a set of different protein structures. The second dataset contains 12 proteins and shows that our method can filter slightly modified decoy conformations of the same protein. The efficiency of the method is also reported.
Collapse
Affiliation(s)
- Kamal Al Nasr
- 1 Department of Computer Science, Tennessee State University , Nashville, Tennessee
| | - Christopher Jones
- 1 Department of Computer Science, Tennessee State University , Nashville, Tennessee
| | - Feras Yousef
- 2 Department of Mathematics, The University of Jordan , Amman, Jordan
| | - Ruba Jebril
- 1 Department of Computer Science, Tennessee State University , Nashville, Tennessee
| |
Collapse
|
3
|
Xu G, Ma T, Zang T, Wang Q, Ma J. OPUS-CSF: A C-atom-based scoring function for ranking protein structural models. Protein Sci 2017; 27:286-292. [PMID: 29047165 PMCID: PMC5734313 DOI: 10.1002/pro.3327] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 10/14/2017] [Accepted: 10/16/2017] [Indexed: 12/12/2022]
Abstract
We report a C‐atom‐based scoring function, named OPUS‐CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS‐CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all‐atom empirical potentials. The average correlation coefficient with TM‐score was also comparable with those of other potentials. OPUS‐CSF is a highly coarse‐grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing, China.,Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| |
Collapse
|
4
|
Minami S, Chikenji G, Ota M. Rules for connectivity of secondary structure elements in protein: Two-layer αβ sandwiches. Protein Sci 2017; 26:2257-2267. [PMID: 28856751 DOI: 10.1002/pro.3285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 08/21/2017] [Accepted: 08/26/2017] [Indexed: 11/09/2022]
Abstract
In protein structures, the fold is described according to the spatial arrangement of secondary structure elements (SSEs: α-helices and β-strands) and their connectivity. The connectivity or the pattern of links among SSEs is one of the most important factors for understanding the variety of protein folds. In this study, we introduced the connectivity strings that encode the connectivities by using the types, positions, and connections of SSEs, and computationally enumerated all the connectivities of two-layer αβ sandwiches. The calculated connectivities were compared with those in natural proteins determined using MICAN, a nonsequential structure comparison method. For 2α-4β, among 23,000 of all connectivities, only 48 were free from irregular connectivities such as loop crossing. Of these, only 20 were found in natural proteins and the superfamilies were biased toward certain types of connectivities. A similar disproportional distribution was confirmed for most of other spatial arrangements of SSEs in the two-layer αβ sandwiches. We found two connectivity rules that explain the bias well: the abundances of interlayer connecting loops that bridge SSEs in the distinct layers; and nonlocal β-strand pairs, two spatially adjacent β-strands located at discontinuous positions in the amino acid sequence. A two-dimensional plot of these two properties indicated that the two connectivity rules are not independent, which may be interpreted as a rule for the cooperativity of proteins.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Nagoya, 464-8601, Japan
| | - George Chikenji
- Department of Computational Science and Engineering, Graduate School of Engineering, Nagoya University, Nagoya, 464-8601, Japan
| | - Motonori Ota
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Nagoya, 464-8601, Japan
| |
Collapse
|
5
|
Xie ZR, Chen J, Zhao Y, Wu Y. Decomposing the space of protein quaternary structures with the interface fragment pair library. BMC Bioinformatics 2015; 16:14. [PMID: 25592649 PMCID: PMC4384354 DOI: 10.1186/s12859-014-0437-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 12/18/2014] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The physical interactions between proteins constitute the basis of protein quaternary structures. They dominate many biological processes in living cells. Deciphering the structural features of interacting proteins is essential to understand their cellular functions. Similar to the space of protein tertiary structures in which discrete patterns are clearly observed on fold or sub-fold motif levels, it has been found that the space of protein quaternary structures is highly degenerate due to the packing of compact secondary structure elements at interfaces. Therefore, it is necessary to further decompose the protein quaternary structural space into a more local representation. RESULTS Here we constructed an interface fragment pair library from the current structure database of protein complexes. After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs. These motifs were further used to guide complex assembly. A large-scale benchmark test shows that the native-like binding is highly likely in the structural ensemble of modeled protein complexes that were built through the library. CONCLUSIONS Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces. Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.
Collapse
Affiliation(s)
- Zhong-Ru Xie
- Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA.
| | - Jiawen Chen
- Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA.
| | - Yilin Zhao
- Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA.
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA.
| |
Collapse
|
6
|
López-Blanco JR, Chacón P. Structural modeling from electron microscopy data. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1199] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- José Ramón López-Blanco
- Department of Biological Physical Chemistry; Rocasolano Physical Chemistry Institute, CSIC; Madrid Spain
| | - Pablo Chacón
- Department of Biological Physical Chemistry; Rocasolano Physical Chemistry Institute, CSIC; Madrid Spain
| |
Collapse
|
7
|
Al Nasr K, Ranjan D, Zubair M, Chen L, He J. Solving the Secondary Structure Matching Problem in Cryo-EM De Novo Modeling Using a Constrained K-Shortest Path Graph Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:419-430. [PMID: 26355788 DOI: 10.1109/tcbb.2014.2302803] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ(2)N(2)2(N)) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.
Collapse
|
8
|
Nasr KA, Liu C, Rwebangira M, Burge L, He J. Intensity-based skeletonization of CryoEM gray-scale images using a true segmentation-free algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1289-98. [PMID: 24384713 PMCID: PMC4104753 DOI: 10.1109/tcbb.2013.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Cryo-electron microscopy is an experimental technique that is able to produce 3D gray-scale images of protein molecules. In contrast to other experimental techniques, cryo-electron microscopy is capable of visualizing large molecular complexes such as viruses and ribosomes. At medium resolution, the positions of the atoms are not visible and the process cannot proceed. The medium-resolution images produced by cryo-electron microscopy are used to derive the atomic structure of the proteins in de novo modeling. The skeletons of the 3D gray-scale images are used to interpret important information that is helpful in de novo modeling. Unfortunately, not all features of the image can be captured using a single segmentation. In this paper, we present a segmentation-free approach to extract the gray-scale curve-like skeletons. The approach relies on a novel representation of the 3D image, where the image is modeled as a graph and a set of volume trees. A test containing 36 synthesized maps and one authentic map shows that our approach can improve the performance of the two tested tools used in de novo modeling. The improvements were 62 and 13 percent for Gorgon and DP-TOSS, respectively.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, 3500 John Merritt Blvd, McCord Hall, Nashville, TN 37209
| | - Chunmei Liu
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Mugizi Rwebangira
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Legand Burge
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Jing He
- Department of Computer Science, Old Dominion University, Engineering & Computer Sciences Bldg., 4700 Elkhorn Ave, Suite 3300, Norfolk, VA 23529
| |
Collapse
|
9
|
Rusu M, Wriggers W. Evolutionary bidirectional expansion for the tracing of alpha helices in cryo-electron microscopy reconstructions. J Struct Biol 2011; 177:410-9. [PMID: 22155667 DOI: 10.1016/j.jsb.2011.11.029] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 11/22/2011] [Accepted: 11/28/2011] [Indexed: 01/10/2023]
Abstract
Cryo-electron microscopy (cryo-EM) enables the imaging of macromolecular complexes in near-native environments at resolutions that often permit the visualization of secondary structure elements. For example, alpha helices frequently show consistent patterns in volumetric maps, exhibiting rod-like structures of high density. Here, we introduce VolTrac (Volume Tracer) - a novel technique for the annotation of alpha-helical density in cryo-EM data sets. VolTrac combines a genetic algorithm and a bidirectional expansion with a tabu search strategy to trace helical regions. Our method takes advantage of the stochastic search by using a genetic algorithm to identify optimal placements for a short cylindrical template, avoiding exploration of already characterized tabu regions. These placements are then utilized as starting positions for the adaptive bidirectional expansion that characterizes the curvature and length of the helical region. The method reliably predicted helices with seven or more residues in experimental and simulated maps at intermediate (4-10Å) resolution. The observed success rates, ranging from 70.6% to 100%, depended on the map resolution and validation parameters. For successful predictions, the helical axes were located within 2Å from known helical axes of atomic structures.
Collapse
Affiliation(s)
- Mirabela Rusu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St., Houston, TX 77030, USA.
| | | |
Collapse
|
10
|
AL NASR KAMAL, RANJAN DESH, ZUBAIR MOHAMMAD, HE JING. RANKING VALID TOPOLOGIES OF THE SECONDARY STRUCTURE ELEMENTS USING A CONSTRAINT GRAPH. J Bioinform Comput Biol 2011; 9:415-30. [DOI: 10.1142/s0219720011005604] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Revised: 04/12/2011] [Accepted: 04/17/2011] [Indexed: 11/18/2022]
Abstract
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6–10 Å resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N2 2N) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.
Collapse
Affiliation(s)
- KAMAL AL NASR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - DESH RANJAN
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - MOHAMMAD ZUBAIR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - JING HE
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
11
|
LU YONGGANG, HE JING, STRAUSS CHARLIEEM. DERIVING TOPOLOGY AND SEQUENCE ALIGNMENT FOR THE HELIX SKELETON IN LOW-RESOLUTION PROTEIN DENSITY MAPS. J Bioinform Comput Biol 2011; 6:183-201. [DOI: 10.1142/s0219720008003357] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Revised: 10/07/2007] [Accepted: 10/13/2007] [Indexed: 11/18/2022]
Abstract
Cryoelectron microscopy (cryoEM) is an experimental technique to determine the three-dimensional (3D) structure of large protein complexes. Currently, this technique is able to generate protein density maps at 6–9 Å resolution, at which the skeleton of the structure (which is composed of α-helices and β-sheets) can be visualized. As a step towards predicting the entire backbone of the protein from the protein density map, we developed a method to predict the topology and sequence alignment for the skeleton helices. Our method combines the geometrical information of the skeleton helices with the Rosetta ab initio structure prediction method to derive a consensus topology and sequence alignment for the skeleton helices. We tested the method with 60 proteins. For 45 proteins, the majority of the skeleton helices were assigned a correct topology from one of our top ten predictions. The offsets of the alignment for most of the assigned helices were within ±2 amino acids in the sequence. We also analyzed the use of the skeleton helices as a clustering tool for the decoy structures generated by Rosetta. Our comparison suggests that the topology clustering is a better method than a general overlap clustering method to enrich the ranking of decoys, particularly when the decoy pool is small.
Collapse
Affiliation(s)
- YONGGANG LU
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - JING HE
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - CHARLIE E. M. STRAUSS
- Bioscience Division, M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
12
|
Al Nasr K, Sun W, He J. Structure prediction for the helical skeletons detected from the low resolution protein density map. BMC Bioinformatics 2010; 11 Suppl 1:S44. [PMID: 20122218 PMCID: PMC3009517 DOI: 10.1186/1471-2105-11-s1-s44] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background The current advances in electron cryo-microscopy technique have made it possible to obtain protein density maps at about 6-10 Å resolution. Although it is hard to derive the protein chain directly from such a low resolution map, the location of the secondary structures such as helices and strands can be computationally detected. It has been demonstrated that such low-resolution map can be used during the protein structure prediction process to enhance the structure prediction. Results We have developed an approach to predict the 3-dimensional structure for the helical skeletons that can be detected from the low resolution protein density map. This approach does not require the construction of the entire chain and distinguishes the structures based on the conformation of the helices. A test with 35 low resolution density maps shows that the highest ranked structure with the correct topology can be found within the top 1% of the list ranked by the effective energy formed by the helices. Conclusion The results in this paper suggest that it is possible to eliminate the great majority of the bad conformations of the helices even without the construction of the entire chain of the protein. For many proteins, the effective contact energy formed by the secondary structures alone can distinguish a small set of likely structures from the pool.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | | | | |
Collapse
|
13
|
Sun W, He J. Native secondary structure topology has near minimum contact energy among all possible geometrically constrained topologies. Proteins 2009; 77:159-73. [PMID: 19415754 DOI: 10.1002/prot.22427] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Secondary structure topology in this article refers to the order and the direction of the secondary structures, such as helices and strands, with respect to the protein sequence. Even when the locations of the secondary structure Calpha atoms are known, there are still (N!2(N))(M!2(M)) different possible topologies for a protein with N helices and M strands. This work explored the question if the native topology is likely to be identified among a large set of all possible geometrically constrained topologies through an evaluation of the residue contact energy formed by the secondary structures, instead of the entire chain. We developed a contact pair specific and distance specific multiwell function based on the statistical characterization of the side chain distances of 413 proteins in the Protein Data Bank. The multiwell function has specific parameters to each of the 210 pairs of residue contacts. We illustrated a general mathematical method to extend a single well function to a multiwell function to represent the statistical data. We have performed a mutation analysis using 50 proteins to generate all the possible geometrically constrained topologies of the secondary structures. The result shows that the native topology is within the top 25% of the list ranked by the effective contact energies of the secondary structures for all the 50 proteins, and is within the top 5% for 34 proteins. As an application, the method was used to derive the structure of the skeletons from a low resolution density map that can be obtained through electron cryomicroscopy.
Collapse
Affiliation(s)
- Weitao Sun
- Department of Computer Science, New Mexico State University, Las Cruces, New Mexico 88003, USA
| | | |
Collapse
|
14
|
Sun W, He J. Reduction of the secondary structure topological space through direct estimation of the contact energy formed by the secondary structures. BMC Bioinformatics 2009; 10 Suppl 1:S40. [PMID: 19208142 PMCID: PMC2648730 DOI: 10.1186/1471-2105-10-s1-s40] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Electron cryomicroscopy is a fast developing technique aiming at the determination of the 3-dimensional structures of large protein complexes. Using this technique, protein density maps can be generated with 6 to 10 Å resolution. At such resolutions, the secondary structure elements such as helices and β-strands appear to be skeletons and can be computationally detected. However, it is not known which segment of the protein sequence corresponds to which of the skeletons. The topology in this paper refers to the linear order and the directionality of the secondary structures. For a protein with N helices and M strands, there are (N!2N)(M!2M) different topologies, each of which maps N helix segments and M strand segments on the protein sequence to N helix and M strand skeletons. Since the backbone position is not available in the skeleton, each topology of the skeletons corresponds to additional freedom to position the atoms in the skeletons. Results We have developed a method to construct the possible atomic structures for the helix skeletons by sampling the solution space of all the possible topologies of the skeletons. Our method also ranks the possible structures based on the contact energy formed by the secondary structures, rather than the entire chain. If we assume that the backbone atomic positions are known for the skeletons, then the native topology of the secondary structures can be found in the top 30% of the ranked list of all possible topologies for all the 30 proteins tested, and within the top 5% for most of the 30 proteins. Without assuming the backbone location of the skeletons, the possible atomic structures of the skeletons can be constructed using the axis of the skeleton and the sequence segments. The best constructed structure for the skeletons has RMSD to native between 4 and 5 Å for the four tested α-proteins. These best constructed structures were ranked the 17th, 31st, 16th and 5th respectively for the four proteins out of 32066, 391833, 98755 and 192935 possible assignments in the pool. Conclusion Our work suggested that the direct estimation of the contact energy formed by the secondary structures is quite effective in reducing the topological space to a small subset that includes a near native structure for the skeletons.
Collapse
Affiliation(s)
- Weitao Sun
- Department of Computer Science, New Mexico State University, Las Cruces, 88003, USA.
| | | |
Collapse
|
15
|
Wu Y, Dousis AD, Chen M, Li J, Ma J. OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries. J Mol Biol 2009; 385:1314-29. [PMID: 19026662 PMCID: PMC2753268 DOI: 10.1016/j.jmb.2008.10.093] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2008] [Revised: 10/29/2008] [Accepted: 10/31/2008] [Indexed: 10/21/2022]
Abstract
In this article, we present a de novo method for predicting protein domain boundaries, called OPUS-Dom. The core of the method is a novel coarse-grained folding method, VECFOLD, which constructs low-resolution structural models from a target sequence by folding a chain of vectors representing the predicted secondary-structure elements. OPUS-Dom generates a large ensemble of folded structure decoys by VECFOLD and labels the domain boundaries of each decoy by a domain parsing algorithm. Consensus domain boundaries are then derived from the statistical distribution of the putative boundaries and three empirical sequence-based domain profiles. OPUS-Dom generally outperformed several state-of-the-art domain prediction algorithms over various benchmark protein sets. Even though each VECFOLD-generated structure contains large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary-structure segments, rather than sequence-specific constraints.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | - Mingzhi Chen
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jialin Li
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jianpeng Ma
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Verna and Marrs McLean, Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| |
Collapse
|
16
|
Wu Y, Tian X, Lu M, Chen M, Wang Q, Ma J. Folding of small helical proteins assisted by small-angle X-ray scattering profiles. Structure 2008; 13:1587-97. [PMID: 16271882 DOI: 10.1016/j.str.2005.07.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Revised: 07/21/2005] [Accepted: 07/22/2005] [Indexed: 10/25/2022]
Abstract
This paper reports a computational method for folding small helical proteins. The goal was to determine the overall topology of proteins given secondary structure assignment on sequence. In doing so, a Monte Carlo protocol, which combines coarse-grained normal modes and a Hamiltonian at a different scale, was developed to enhance sampling. In addition to the knowledge-based potential functions, a small-angle X-ray scattering (SAXS) profile was also used as a weak constraint for guiding the folding. The algorithm can deliver structural models with overall correct topology, which makes them similar to those of 5 approximately 6 A cryo-EM density maps. The success could contribute to make the SAXS technique a fast and inexpensive solution-phase experimental method for determining the overall topology of small, soluble, but noncrystallizable, helical proteins.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | | | | | | | | | | |
Collapse
|
17
|
Wu Y, Lu M, Chen M, Li J, Ma J. OPUS-Ca: a knowledge-based potential function requiring only Calpha positions. Protein Sci 2007; 16:1449-63. [PMID: 17586777 PMCID: PMC2206690 DOI: 10.1110/ps.072796107] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
18
|
Abstract
The ability to determine the structure of a protein in solution is a critical tool for structural biology, as proteins in their native state are found in aqueous environments. Using a physical chemistry based prediction protocol, we demonstrate the ability to reproduce protein loop geometries in experimentally derived solution structures. Predictions were run on loops drawn from (1)NMR entries in the Protein Databank (PDB), and from (2) the RECOORD database in which NMR entries from the PDB have been standardized and re-refined in explicit solvent. The predicted structures are validated by comparison with experimental distance restraints, a test of structural quality as defined by the WHAT IF structure validation program, root mean square deviation (RMSD) of the predicted loops to the original structural models, and comparison of precision of the original and predicted ensembles. Results show that for the RECOORD ensembles, the predicted loops are consistent with an average of 95%, 91%, and 87% of experimental restraints for the short, medium and long loops respectively. Prediction accuracy is strongly affected by the quality of the original models, with increases in the percentage of experimental restraints violated of 2% for the short loops, and 9% for both the medium and long loops in the PDB derived ensembles. We anticipate the application of our protocol to theoretical modeling of protein structures, such as fold recognition methods; as well as to experimental determination of protein structures, or segments, for which only sparse NMR restraint data is available.
Collapse
Affiliation(s)
- Chaya S Rapp
- Department of Chemistry, Stern College for Women, Yeshiva University, New York, New York 10016, USA.
| | | | | | | |
Collapse
|
19
|
Zong C, Papoian GA, Ulander J, Wolynes PG. Role of Topology, Nonadditivity, and Water-Mediated Interactions in Predicting the Structures of α/β Proteins. J Am Chem Soc 2006; 128:5168-76. [PMID: 16608353 DOI: 10.1021/ja058589v] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The folding of alpha/beta proteins involves most of the commonly known structural and dynamic complexities of the protein energy landscapes. Thus, the interplay among different structural components, taking into account the cooperative interactions, is important in determining the success of protein structure prediction. In this work we present further developments of our knowledge-based force field for alpha/beta proteins, introducing more realistic modeling of many-body interactions governing the folding of beta-sheets. The model's innovations highlight both specific topological characteristics of secondary structures and the generic nonadditive interactions that are mediated by water. We also investigate how a coarse biasing of the protein morphology can be used to understand the role of heterogeneity in protein collapse. Analysis of the simulation results for three test alpha/beta proteins indicates that the addition of the topological and many-body ingredients to the model helps to greatly reduce the roughness in the energy landscape. Consequently, high quality candidate structures for alpha/beta proteins can be generated from simulated annealing runs, using very modest amounts of computer time.
Collapse
Affiliation(s)
- Chenghang Zong
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0371, USA.
| | | | | | | |
Collapse
|
20
|
Topf M, Sali A. Combining electron microscopy and comparative protein structure modeling. Curr Opin Struct Biol 2005; 15:578-85. [PMID: 16118050 DOI: 10.1016/j.sbi.2005.08.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2005] [Revised: 07/01/2005] [Accepted: 08/10/2005] [Indexed: 10/25/2022]
Abstract
Recently, advances have been made in methods and applications that integrate electron microscopy density maps and comparative modeling to produce atomic structures of macromolecular assemblies. Electron microscopy can benefit from comparative modeling through the fitting of comparative models into electron microscopy density maps. Also, comparative modeling can benefit from electron microscopy through the use of intermediate-resolution density maps in fold recognition, template selection and sequence-structure alignment.
Collapse
Affiliation(s)
- Maya Topf
- Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, CA 94143, USA
| | | |
Collapse
|