1
|
Pan H, Wu Z, Liu W, Zhang G. AlphaFun: Structural-Alignment-Based Proteome Annotation Reveals why the Functionally Unknown Proteins (uPE1) Are So Understudied. J Proteome Res 2024; 23:1593-1602. [PMID: 38626392 PMCID: PMC11078154 DOI: 10.1021/acs.jproteome.3c00678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/27/2024] [Accepted: 04/03/2024] [Indexed: 04/18/2024]
Abstract
With the rapid expansion of sequencing of genomes, the functional annotation of proteins becomes a bottleneck in understanding proteomes. The Chromosome-centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome and find functional annotations for them. However, until now there are still 1137 identified human proteins without functional annotation, called uPE1 proteins. Sequence alignment was insufficient to predict their functions, and the crystal structures of most proteins were unavailable. In this study, we demonstrated a new functional annotation strategy, AlphaFun, based on structural alignment using deep-learning-predicted protein structures. Using this strategy, we functionally annotated 99% of the human proteome, including the uPE1 proteins and missing proteins, which have not been identified yet. The accuracy of the functional annotations was validated using the known-function proteins. The uPE1 proteins shared similar functions to the known-function PE1 proteins and tend to express only in very limited tissues. They are evolutionally young genes and thus should conduct functions only in specific tissues and conditions, limiting their occurrence in commonly studied biological models. Such functional annotations provide hints for functional investigations on the uPE1 proteins. This proteome-wide-scale functional annotation strategy is also applicable to any other species.
Collapse
Affiliation(s)
- Hengxin Pan
- MOE Key Laboratory of Tumor
Molecular Biology and Key Laboratory of Functional Protein Research
of Guangdong Higher Education Institutes, Institute of Life and Health
Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Zhenqi Wu
- MOE Key Laboratory of Tumor
Molecular Biology and Key Laboratory of Functional Protein Research
of Guangdong Higher Education Institutes, Institute of Life and Health
Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Wanting Liu
- MOE Key Laboratory of Tumor
Molecular Biology and Key Laboratory of Functional Protein Research
of Guangdong Higher Education Institutes, Institute of Life and Health
Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- MOE Key Laboratory of Tumor
Molecular Biology and Key Laboratory of Functional Protein Research
of Guangdong Higher Education Institutes, Institute of Life and Health
Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| |
Collapse
|
2
|
Uttarotai T, Mukjang N, Chaisoung N, Pathom-Aree W, Pekkoh J, Pumas C, Sattayawat P. Putative Protein Discovery from Microalgal Genomes as a Synthetic Biology Protein Library for Heavy Metal Bio-Removal. BIOLOGY 2022; 11:biology11081226. [PMID: 36009852 PMCID: PMC9405338 DOI: 10.3390/biology11081226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/06/2022] [Accepted: 08/12/2022] [Indexed: 11/22/2022]
Abstract
Simple Summary Nowadays, heavy metal polluted wastewater is one of the global challenges that leads to an insufficient supply of clean water. Taking advantage of what nature has to offer, several organisms, including microalgae, can natively bioremediate these heavy metals. However, the effectiveness of such processes does not meet expectations, especially with the increasing amount of pollution in today’s world. Therefore, with the goal of creating effective strains, synthetic biology via bioengineering is widely used as a strategy to enhance the heavy metal bio-removing capability, either by directly engineering the native ability of organisms or by transferring the ability to a more suitable host. In order to do so, a list of genes or proteins involved in the processes is crucial for stepwise engineering. Yet, a large amount of information remains to be discovered. In this work, a comprehensive library of putative proteins that are involved in heavy metal bio-removal from microalgae was constructed. Moreover, with the development of machine learning, the 3D structures of these proteins are also predicted, using machine learning-based methods, to aid the use of synthetic biology further. Abstract Synthetic biology is a principle that aims to create new biological systems with particular functions or to redesign the existing ones through bioengineering. Therefore, this principle is often utilized as a tool to put the knowledge learned to practical use in actual fields. However, there is still a great deal of information remaining to be found, and this limits the possible utilization of synthetic biology, particularly on the topic that is the focus of the present work—heavy metal bio-removal. In this work, we aim to construct a comprehensive library of putative proteins that might support heavy metal bio-removal. Hypothetical proteins were discovered from Chlorella and Scenedesmus genomes and extensively annotated. The protein structures of these putative proteins were also modeled through Alphafold2. Although a portion of this workflow has previously been demonstrated to annotate hypothetical proteins from whole genome sequences, the adaptation of such steps is yet to be done for library construction purposes. We also demonstrated further downstream steps that allow a more accurate function prediction of the hypothetical proteins by subjecting the models generated to structure-based annotation. In conclusion, a total of 72 newly discovered putative proteins were annotated with ready-to-use predicted structures available for further investigation.
Collapse
Affiliation(s)
- Toungporn Uttarotai
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nilita Mukjang
- Department of Entomology and Plant Pathology, Faculty of Agriculture, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Natcha Chaisoung
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Wasu Pathom-Aree
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Jeeraporn Pekkoh
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Chayakorn Pumas
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pachara Sattayawat
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
- Research Center in Bioresources for Agriculture, Industry and Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Research Center of Microbial Diversity and Sustainable Utilization, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
- Correspondence:
| |
Collapse
|
3
|
Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. FRONTIERS IN BIOINFORMATICS 2022; 2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
Collapse
Affiliation(s)
- Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- *Correspondence: Daisuke Kihara,
| |
Collapse
|
4
|
Ilgisonis EV, Pogodin PV, Kiseleva OI, Tarbeeva SN, Ponomarenko EA. Evolution of Protein Functional Annotation: Text Mining Study. J Pers Med 2022; 12:jpm12030479. [PMID: 35330478 PMCID: PMC8952229 DOI: 10.3390/jpm12030479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 11/23/2022] Open
Abstract
Within the Human Proteome Project initiative framework for creating functional annotations of uPE1 proteins, the neXt-CP50 Challenge was launched in 2018. In analogy with the missing-protein challenge, each command deciphers the functional features of the proteins in the chromosome-centric mode. However, the neXt-CP50 Challenge is more complicated than the missing-protein challenge: the approaches and methods for solving the problem are clear, but neither the concept of protein function nor specific experimental and/or bioinformatics protocols have been standardized to address it. We proposed using a retrospective analysis of the key HPP repository, the neXtProt database, to identify the most frequently used experimental and bioinformatic methods for analyzing protein functions, and the dynamics of accumulation of functional annotations. It has been shown that the dynamics of the increase in the number of proteins with known functions are greater than the progress made in the experimental confirmation of the existence of questionable proteins in the framework of the missing-protein challenge. At the same time, the functional annotation is based on the guilty-by-association postulate, according to which, based on large-scale experiments on API-MS and Y2H, proteins with unknown functions are most likely mapped through “handshakes” to biochemical processes.
Collapse
|
5
|
Kalmykova SD, Arapidi GP, Urban AS, Osetrova MS, Gordeeva VD, Ivanov VT, Govorun VM. In Silico Analysis of Peptide Potential Biological Functions. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2018. [DOI: 10.1134/s106816201804009x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Lee GY, You DG, Lee HR, Hwang SW, Lee CJ, Yoo YD. Romo1 is a mitochondrial nonselective cation channel with viroporin-like characteristics. J Cell Biol 2018; 217:2059-2071. [PMID: 29545371 PMCID: PMC5987721 DOI: 10.1083/jcb.201709001] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 01/22/2018] [Accepted: 02/28/2018] [Indexed: 11/22/2022] Open
Abstract
Romo1 regulates mitochondrial reactive oxygen species production and acts as an essential redox sensor in mitochondrial dynamics. Lee et al. demonstrate that Romo1 is a unique mitochondrial ion channel with viroporin-like characteristics that distinguish Romo1 from other known eukaryotic ion channels. Reactive oxygen species (ROS) modulator 1 (Romo1) is a nuclear-encoded mitochondrial inner membrane protein known to regulate mitochondrial ROS production and to act as an essential redox sensor in mitochondrial dynamics. Although its physiological roles have been studied for a decade, the biophysical mechanisms that explain these activities of Romo1 are unclear. In this study, we report that Romo1 is a unique mitochondrial ion channel that differs from currently identified eukaryotic ion channels. Romo1 is a highly conserved protein with structural features of class II viroporins, which are virus-encoded nonselective cation channels. Indeed, Romo1 forms a nonselective cation channel with its amphipathic helical transmembrane domain necessary for pore-forming activity. Notably, channel activity was specifically inhibited by Fe2+ ions, an essential transition metal ion in ROS metabolism. Using structural bioinformatics, we designed an experimental data–guided structural model of Romo1 with a rational hexameric structure. We propose that Romo1 establishes a new category of viroporin-like nonselective cation channel in eukaryotes.
Collapse
Affiliation(s)
- Gi Young Lee
- Laboratory of Molecular Cell Biology, Graduate School of Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| | - Deok-Gyun You
- Laboratory of Molecular Cell Biology, Graduate School of Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| | - Hye-Ra Lee
- Laboratory of Molecular Cell Biology, Graduate School of Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea.,Department of Biosystems and Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Sun Wook Hwang
- Department of Biomedical Sciences, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| | - C Justin Lee
- Center for Neuroscience and Functional Connectomics, Korea Institute of Science and Technology, Seoul, Republic of Korea.,Korea University-Korea Institute of Science and Technology Graduate School of Convergence Technology, Korea University, Seoul, Republic of Korea
| | - Young Do Yoo
- Laboratory of Molecular Cell Biology, Graduate School of Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| |
Collapse
|
7
|
Wei Q, McGraw J, Khan I, Kihara D. Using PFP and ESG Protein Function Prediction Web Servers. Methods Mol Biol 2017; 1611:1-14. [PMID: 28451967 DOI: 10.1007/978-1-4939-7015-5_1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Elucidating biological function of proteins is a fundamental problem in molecular biology and bioinformatics. Conventionally, protein function is annotated based on homology using sequence similarity search tools such as BLAST and FASTA. These methods perform well when obvious homologs exist for a query sequence; however, they will not provide any functional information otherwise. As a result, the functions of many genes in newly sequenced genomes are left unknown, which await functional interpretation. Here, we introduce two webservers for function prediction methods, which effectively use distantly related sequences to improve function annotation coverage and accuracy: Protein Function Prediction (PFP) and Extended Similarity Group (ESG). These two methods have been tested extensively in various benchmark studies and ranked among the top in community-based assessments for computational function annotation, including Critical Assessment of Function Annotation (CAFA) in 2010-2011 (CAFA1) and 2013-2014 (CAFA2). Both servers are equipped with user-friendly visualizations of predicted GO terms, which provide intuitive illustrations of relationships of predicted GO terms. In addition to PFP and ESG, we also introduce NaviGO, a server for the interactive analysis of GO annotations of proteins. All the servers are available at http://kiharalab.org/software.php .
Collapse
Affiliation(s)
- Qing Wei
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Joshua McGraw
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Ishita Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA. .,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
8
|
Meng J, Wekesa JS, Shi GL, Luan YS. Protein function prediction based on data fusion and functional interrelationship. Math Biosci 2016; 274:25-32. [DOI: 10.1016/j.mbs.2016.02.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 01/08/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
|
9
|
Nakamura T, Tomii K. Protein ligand-binding site comparison by a reduced vector representation derived from multidimensional scaling of generalized description of binding sites. Methods 2016; 93:35-40. [DOI: 10.1016/j.ymeth.2015.08.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/25/2015] [Accepted: 08/10/2015] [Indexed: 11/25/2022] Open
|
10
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|
11
|
Khan IK, Wei Q, Chapman S, KC DB, Kihara D. The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches. Gigascience 2015; 4:43. [PMID: 26380077 PMCID: PMC4570625 DOI: 10.1186/s13742-015-0083-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 08/27/2015] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets. RESULTS For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed. CONCLUSIONS Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.
Collapse
Affiliation(s)
- Ishita K. Khan
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 USA
| | - Qing Wei
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 USA
| | - Samuel Chapman
- Department of Computational Science and Engineering, North Carolina A & T State University, Greensboro, NC 27411 USA
| | - Dukka B. KC
- Department of Computational Science and Engineering, North Carolina A & T State University, Greensboro, NC 27411 USA
| | - Daisuke Kihara
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907 USA
| |
Collapse
|
12
|
Computational prediction of protein function based on weighted mapping of domains and GO terms. BIOMED RESEARCH INTERNATIONAL 2014; 2014:641469. [PMID: 24868539 PMCID: PMC4017789 DOI: 10.1155/2014/641469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 03/12/2014] [Indexed: 11/17/2022]
Abstract
In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.
Collapse
|
13
|
Abstract
The amount of known protein structures is continuously growing, exhibited in over 95,000 3D structures freely available via the PDB. Over the last decade, pharmaceutical research has sparked interest in computationally extracting information from this large data pool, resulting in a homology-driven knowledge transfer from annotated to new structures. Studying protein structures with respect to understanding and modulating their functional behavior means analyzing their centers of action. Therefore, the detection and description of potential binding sites on the protein surface is a major step towards protein classification and assessment. Subsequently, these representations can be incorporated to compare proteins, and to predict their druggability or function. Especially in the context of target identification and polypharmacology, automated tools for large-scale target comparisons are highly needed. In this article, developments for automated structure-based target assessment are reviewed and remaining challenges as well as future perspectives are discussed.
Collapse
|
14
|
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces. Methods Mol Biol 2014; 1137:105-17. [PMID: 24573477 DOI: 10.1007/978-1-4939-0366-5_8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
Collapse
|
15
|
Mass spectrometry coupled experiments and protein structure modeling methods. Int J Mol Sci 2013; 14:20635-57. [PMID: 24132151 PMCID: PMC3821635 DOI: 10.3390/ijms141020635] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 09/17/2013] [Accepted: 09/19/2013] [Indexed: 01/02/2023] Open
Abstract
With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.
Collapse
|
16
|
Vikrant, Nakhwa P, Badgujar DC, Kumar R, Rathore KKS, Varma AK. Structural and functional characterization of the MERIT40 to understand its role in DNA repair. J Biomol Struct Dyn 2013; 32:2017-32. [PMID: 24125081 DOI: 10.1080/07391102.2013.843473] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
MERIT40 (MEdiator of RAP80 Interaction and Targeting 40) is a novel associate of the BRCA1-complex and plays an essential role in DNA damage repair. It is the least characterized protein of BRCA1-complex and mainly responsible for maintaining the complex integrity. However, its structural and functional aspects of regulating the complex stability still remain elusive. Here, we carried out a comprehensive examination of MERIT40 biophysical properties and identified its novel interacting partner which would help to understand its role in BRCA1-complex. The recombinant protein was purified by affinity chromatography and unfolding pathway was determined using spectroscopic and calorimetric methods. Molecular model was generated using combinatorial approaches of modeling, and monomer-monomer docking was carried out to identify dimeric interface. Disordered region of MERIT40 was hatchet using trypsin and chymotrypsin to illustrate the existence of stable domain whose function was speculated through DALI search. Our findings suggest that MERIT40 forms a dimer in a concentration-independent manner. Its central region shows remarkable stability towards the protease digestion and has structural similarity with vWA-like region, a domain mainly present in complement activation factors. MERIT40 undergoes a three-state unfolding transition pathway with a dimeric intermediate. It interacts with adaptor molecule of BRCA1-complex, called ABRAXAS, thus help in extending the bridging interaction among various members which further stabilizes the whole complex. The results presented in this paper provide first-hand information on structural and folding behavior of MERIT40. These findings will help in elucidating the role of protein-protein interactions in stabilization of BRCA1-complex.
Collapse
Affiliation(s)
- Vikrant
- a Tata Memorial Centre, Advanced Centre for Treatment, Research and Education in Cancer , Kharghar, Navi Mumbai , Maharashtra 410 210 , India
| | | | | | | | | | | |
Collapse
|
17
|
A novel function prediction approach using protein overlap networks. BMC SYSTEMS BIOLOGY 2013; 7:61. [PMID: 23866986 PMCID: PMC3720179 DOI: 10.1186/1752-0509-7-61] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 07/12/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database. RESULTS The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein. CONCLUSIONS The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.
Collapse
|
18
|
Chitale M, Khan IK, Kihara D. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment. BMC Bioinformatics 2013; 14 Suppl 3:S2. [PMID: 23514353 PMCID: PMC3584938 DOI: 10.1186/1471-2105-14-s3-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. RESULTS We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. CONCLUSION The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.
Collapse
Affiliation(s)
- Meghana Chitale
- Department of Computer Science, Purdue University, 305 N, University Street, West Lafayette, Indiana 47907, USA
| | | | | |
Collapse
|
19
|
Shi Z, Wedd AG, Gras SL. Parallel in vivo DNA assembly by recombination: experimental demonstration and theoretical approaches. PLoS One 2013; 8:e56854. [PMID: 23468883 PMCID: PMC3585241 DOI: 10.1371/journal.pone.0056854] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 01/17/2013] [Indexed: 01/10/2023] Open
Abstract
The development of synthetic biology requires rapid batch construction of large gene networks from combinations of smaller units. Despite the availability of computational predictions for well-characterized enzymes, the optimization of most synthetic biology projects requires combinational constructions and tests. A new building-brick-style parallel DNA assembly framework for simple and flexible batch construction is presented here. It is based on robust recombination steps and allows a variety of DNA assembly techniques to be organized for complex constructions (with or without scars). The assembly of five DNA fragments into a host genome was performed as an experimental demonstration.
Collapse
Affiliation(s)
- Zhenyu Shi
- School of Chemistry, University of Melbourne, Parkville, Victoria, Australia.
| | | | | |
Collapse
|
20
|
Volkamer A, Kuhn D, Rippmann F, Rarey M. Predicting enzymatic function from global binding site descriptors. Proteins 2012; 81:479-89. [DOI: 10.1002/prot.24205] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 09/21/2012] [Accepted: 10/11/2012] [Indexed: 11/09/2022]
|
21
|
Khan I, Chitale M, Rayon C, Kihara D. Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins. BMC Proc 2012; 6 Suppl 7:S5. [PMID: 23173871 PMCID: PMC3504920 DOI: 10.1186/1753-6561-6-s7-s5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background Advancements in function prediction algorithms are enabling large scale computational annotation for newly sequenced genomes. With the increase in the number of functionally well characterized proteins it has been observed that there are many proteins involved in more than one function. These proteins characterized as moonlighting proteins show varied functional behavior depending on the cell type, localization in the cell, oligomerization, multiple binding sites, etc. The functional diversity shown by moonlighting proteins may have significant impact on the traditional sequence based function prediction methods. Here we investigate how well diverse functions of moonlighting proteins can be predicted by some existing function prediction methods. Results We have analyzed the performances of three major sequence based function prediction methods, PSI-BLAST, the Protein Function Prediction (PFP), and the Extended Similarity Group (ESG) on predicting diverse functions of moonlighting proteins. In predicting discrete functions of a set of 19 experimentally identified moonlighting proteins, PFP showed overall highest recall among the three methods. Although ESG showed the highest precision, its recall was lower than PSI-BLAST. Recall by PSI-BLAST greatly improved when BLOSUM45 was used instead of BLOSUM62. Conclusion We have analyzed the performances of PFP, ESG, and PSI-BLAST in predicting the functional diversity of moonlighting proteins. PFP shows overall better performance in predicting diverse moonlighting functions as compared with PSI-BLAST and ESG. Recall by PSI-BLAST greatly improved when BLOSUM45 was used. This analysis indicates that considering weakly similar sequences in prediction enhances the performance of sequence based AFP methods in predicting functional diversity of moonlighting proteins. The current study will also motivate development of novel computational frameworks for automatic identification of such proteins.
Collapse
Affiliation(s)
- Ishita Khan
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN 47907, USA.
| | | | | | | |
Collapse
|
22
|
Messih MA, Chitale M, Bajic VB, Kihara D, Gao X. Protein domain recurrence and order can enhance prediction of protein functions. Bioinformatics 2012; 28:i444-i450. [PMID: 22962465 PMCID: PMC3436825 DOI: 10.1093/bioinformatics/bts398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference. RESULTS We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions. AVAILABILITY The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx. CONTACT dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Mario Abdel Messih
- Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | | | | | | | | |
Collapse
|