1
|
Huang M, Sun J, Wang J, Ye X, Chen Z, Zhao X, Zhang K, Ma L, Xue J, Luo Y, Wu X, Wang H, Wang C, Liu Z, Xie Y, Chen Y, Wang Q, Wang Y, Gao G. Goose multi-omics database: A comprehensive multi-omics database for goose genomics. Poult Sci 2025; 104:104842. [PMID: 39874782 PMCID: PMC11810826 DOI: 10.1016/j.psj.2025.104842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 01/16/2025] [Accepted: 01/21/2025] [Indexed: 01/30/2025] Open
Abstract
Multi-omics has helped elucidate the gene expression patterns and genomic variations closely associated with economically significant traits in geese. Despite the substantial genomic data generated through extensive goose studies, a unified platform for integrating these datasets is lacking. To address this gap, we introduced the Goose Multi-omics Database (GMD), which is accessible at http://goosedb.com/. The GMD is a comprehensive resource enabling streamlined search, analysis, and visualization of genetic information through a unified interface, providing insights into phenotypic traits, gene sequences, structures, expression profiles, genomic variations, gene families, homology, and collinearity. Equipped with robust analytical tools such as GBrowse and BLAST, the GMD facilitates rapid access to target gene information, significantly enhancing the efficiency and productivity of genomic research. By serving as a versatile and intuitive online repository, the GMD offers transformative potential for advancing goose biology, fostering multi-omics investigations, and integrating cutting-edge methodologies such as deep learning to accelerate discoveries in goose genomics.
Collapse
Affiliation(s)
| | - Jiahe Sun
- Southwest University, Chongqing, 402460, PR China
| | - Jian Wang
- Jiangsu Agri-animal Husbandry Vocational College, Taizhou, Jiangsu 225300, PR China
| | - Xiaoli Ye
- Southwest University, Chongqing, 402460, PR China
| | - Zhuping Chen
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Xianzhi Zhao
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Keshan Zhang
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Lin Ma
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Jiajia Xue
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Yi Luo
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Xianwen Wu
- Department of Laboratory Animal Sciences, Peking University Health Sciences Center, Beijing 100191, PR China
| | - Haiwei Wang
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Chao Wang
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Zuohua Liu
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Youhui Xie
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Ying Chen
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China
| | - Qigui Wang
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China.
| | - Yi Wang
- Southwest University, Chongqing, 402460, PR China.
| | - Guangliang Gao
- Chongqing Engineering Research Center of Goose Genetic Improvement, Institute of Poultry Science, Chongqing Academy of Animal Science, Rongchang District, Chongqing 402460, PR China.
| |
Collapse
|
2
|
Wu M, Lv K, Li J, Wu B, He B. Coevolutionary analysis reveals a distal amino acid residue pair affecting the catalytic activity of GH5 processive endoglucanase from Bacillus subtilis BS-5. Biotechnol Bioeng 2022; 119:2105-2114. [PMID: 35438195 DOI: 10.1002/bit.28113] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 04/05/2022] [Accepted: 04/08/2022] [Indexed: 11/06/2022]
Abstract
EG5C-1, processive endoglucanase from Bacillus subtilis, is a typical bifunctional cellulase with endoglucanase and exoglucanase activities. The engineering of processive endoglucanase focuses on the catalytic pocket or carbohydrate-binding module tailoring based on sequence/structure information. Herein, a computational strategy was applied to identify the desired mutants in the enzyme molecule by evolutionary coupling analysis; subsequently, four residue pairs were selected as evolutionary mutational hotspots. Based on iterative-saturation mutagenesis and subsequent enzymatic activity analysis, a superior mutant K51T/L93T was identified away from the active center. This variant had increased specific activity from 4170 U/µmol of wild-type (WT) to 5678 U/µmol towards CMC-Na and an increase towards the substrate Avicel from 320 U/µmol in WT to 521 U/µmol. In addition, kinetic measurements suggested that superior mutant K51T/L93T had a high substrate affinity (Km ) and a remarkable improvement in catalytic efficiency (kcat /Km ). Furthermore, molecular dynamics simulations revealed that the K51T/L93T mutation altered the spatial conformation at the active site cleft, enhancing the interaction frequency between active site residues and substrate, improving catalytic efficiency and substrate affinity. The current studies provided some perspectives on the effects of distal residue substitution, which might assist in the engineering of processive endoglucanase or other glycoside hydrolases. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mujunqi Wu
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan road, Nanjing, 211816, Jiangsu, China
| | - Kemin Lv
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan road, Nanjing, 211816, Jiangsu, China
| | - Jiahuang Li
- School of Biopharmacy, China Pharmaceutical University, Nanjing, 211198, Jiangsu, China
| | - Bin Wu
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan road, Nanjing, 211816, Jiangsu, China
| | - Bingfang He
- School of Pharmaceutical Sciences, Nanjing Tech University, 30 Puzhunan road, Nanjing, 211816, Jiangsu, China
| |
Collapse
|
3
|
Soltanikazemi E, Quadir F, Roy RS, Guo Z, Cheng J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 2021; 90:720-731. [PMID: 34716620 PMCID: PMC8816881 DOI: 10.1002/prot.26269] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/25/2021] [Accepted: 10/12/2021] [Indexed: 12/21/2022]
Abstract
Predicting the quaternary structure of protein complex is an important problem. Inter‐chain residue‐residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter‐chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter‐chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter‐chain contacts as input, GD can reconstruct high‐quality structural models for homodimers and heterodimers with average TM‐score ranging from 0.92 to 0.99 and average interface root mean square distance from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter‐chain contacts as restraints, the average TM‐score of the structural models built by GD is 0.76. For 46% of the homodimers, high‐quality structural models with TM‐score ≥ 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter‐chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a Critical Assessment of Techniques for Protein Structure Prediction–Critical Assessments of Predictions of Interactions dataset.
Collapse
Affiliation(s)
- Elham Soltanikazemi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
4
|
Wong AKC, Sze-To HY, Johanning GL. Pattern to Knowledge: Deep Knowledge-Directed Machine Learning for Residue-Residue Interaction Prediction. Sci Rep 2018; 8:14841. [PMID: 30287904 PMCID: PMC6172270 DOI: 10.1038/s41598-018-32834-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 09/17/2018] [Indexed: 11/21/2022] Open
Abstract
Residue-residue close contact (R2R-C) data procured from three-dimensional protein-protein interaction (PPI) experiments is currently used for predicting residue-residue interaction (R2R-I) in PPI. However, due to complex physiochemical environments, R2R-I incidences, facilitated by multiple factors, are usually entangled in the source environment and masked in the acquired data. Here we present a novel method, P2K (Pattern to Knowledge), to disentangle R2R-I patterns and render much succinct discriminative information expressed in different specific R2R-I statistical/functional spaces. Since such knowledge is not visible in the data acquired, we refer to it as deep knowledge. Leveraging the deep knowledge discovered to construct machine learning models for sequence-based R2R-I prediction, without trial-and-error combination of the features over external knowledge of sequences, our R2R-I predictor was validated for its effectiveness under stringent leave-one-complex-out-alone cross-validation in a benchmark dataset, and was surprisingly demonstrated to perform better than an existing sequence-based R2R-I predictor by 28% (p: 1.9E-08). P2K is accessible via our web server on https://p2k.uwaterloo.ca .
Collapse
Affiliation(s)
- Andrew K C Wong
- Department of Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, N2L 3G1, Ontario, Canada.
| | - Ho Yin Sze-To
- Department of Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, N2L 3G1, Ontario, Canada
| | - Gary L Johanning
- Biosciences Division, SRI International, 333 Ravenswood Ave, Menlo Park, CA, USA
| |
Collapse
|
5
|
Vyas R, Bapat S, Goel P, Karthikeyan M, Tambe SS, Kulkarni BD. Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:27-37. [PMID: 28113781 DOI: 10.1109/tcbb.2016.2621042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.
Collapse
|
6
|
Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017; 33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein. Results We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX. Availability and implementation All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/. Contact malgorzata.kotulska@pwr.edu.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - B M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
7
|
Goodacre N, Edwards N, Danielsen M, Uetz P, Wu C. Predicting nsSNPs that Disrupt Protein-Protein Interactions Using Docking. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1082-1093. [PMID: 26812731 DOI: 10.1109/tcbb.2016.2520931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The human genome contains a large number of protein polymorphisms due to individual genome variation. How many of these polymorphisms lead to altered protein-protein interaction is unknown. We have developed a method to address this question. The intersection of the SKEMPI database (of affinity constants among interacting proteins) and CAPRI 4.0 docking benchmark was docked using HADDOCK, leading to a training set of 166 mutant pairs. A random forest classifier based on the differences in resulting docking scores between the 166 mutant pairs and their wild-types was used, to distinguish between variants that have either completely or partially lost binding ability. Fifty percent of non-binders were correctly predicted with a false discovery rate of only 2 percent. The model was tested on a set of 15 HIV-1 - human, as well as seven human- human glioblastoma-related, mutant protein pairs: 50 percent of combined non-binders were correctly predicted with a false discovery rate of 10 percent. The model was also used to identify 10 protein-protein interactions between human proteins and their HIV-1 partners that are likely to be abolished by rare non-synonymous single-nucleotide polymorphisms (nsSNPs). These nsSNPs may represent novel and potentially therapeutically-valuable targets for anti-viral therapy by disruption of viral binding.
Collapse
|
8
|
Simkovic F, Ovchinnikov S, Baker D, Rigden DJ. Applications of contact predictions to structural biology. IUCRJ 2017; 4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023]
Abstract
Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
9
|
Du T, Liao L, Wu CH. Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:17. [PMID: 27818677 PMCID: PMC5075339 DOI: 10.1186/s13637-016-0051-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 09/25/2016] [Indexed: 11/10/2022]
Abstract
Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as "feedback" to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.
Collapse
Affiliation(s)
- Tianchuan Du
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Cathy H Wu
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| |
Collapse
|
10
|
Du T, Liao L, Wu CH, Sun B. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning. Methods 2016; 110:97-105. [DOI: 10.1016/j.ymeth.2016.06.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 06/03/2016] [Indexed: 11/28/2022] Open
|
11
|
Saccà C, Teso S, Diligenti M, Passerini A. Improved multi-level protein-protein interaction prediction with semantic-based regularization. BMC Bioinformatics 2014; 15:103. [PMID: 24725682 PMCID: PMC4004462 DOI: 10.1186/1471-2105-15-103] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 03/03/2014] [Indexed: 11/24/2022] Open
Abstract
Background Protein–protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels. Results Inspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels. Conclusions We study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein–domain–residue hierarchy.
Collapse
Affiliation(s)
| | | | | | - Andrea Passerini
- Dipartimento di Ingegneria e Scienza dell'Informazione, University of Trento, Trento, Italy.
| |
Collapse
|
12
|
Mosca R, Pons T, Céol A, Valencia A, Aloy P. Towards a detailed atlas of protein–protein interactions. Curr Opin Struct Biol 2013; 23:929-40. [PMID: 23896349 DOI: 10.1016/j.sbi.2013.07.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 07/04/2013] [Accepted: 07/08/2013] [Indexed: 12/30/2022]
|