1
|
Monzon V, Paysan-Lafosse T, Wood V, Bateman A. Reciprocal best structure hits: using AlphaFold models to discover distant homologues. BIOINFORMATICS ADVANCES 2022; 2:vbac072. [PMID: 36408459 PMCID: PMC9666668 DOI: 10.1093/bioadv/vbac072] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/16/2022] [Accepted: 10/05/2022] [Indexed: 11/17/2022]
Abstract
Motivation The conventional methods to detect homologous protein pairs use the comparison of protein sequences. But the sequences of two homologous proteins may diverge significantly and consequently may be undetectable by standard approaches. The release of the AlphaFold 2.0 software enables the prediction of highly accurate protein structures and opens many opportunities to advance our understanding of protein functions, including the detection of homologous protein structure pairs. Results In this proof-of-concept work, we search for the closest homologous protein pairs using the structure models of five model organisms from the AlphaFold database. We compare the results with homologous protein pairs detected by their sequence similarity and show that the structural matching approach finds a similar set of results. In addition, we detect potential novel homologs solely with the structural matching approach, which can help to understand the function of uncharacterized proteins and make previously overlooked connections between well-characterized proteins. We also observe limitations of our implementation of the structure-based approach, particularly when handling highly disordered proteins or short protein structures. Our work shows that high accuracy protein structure models can be used to discover homologous protein pairs, and we expose areas for improvement of this structural matching approach. Availability and Implementation Information to the discovered homologous protein pairs can be found at the following URL: https://doi.org/10.17863/CAM.87873. The code can be accessed here: https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Vivian Monzon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| |
Collapse
|
2
|
Agapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, et alAgapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Toro S, Tomczuk M, Trovisco V, Tutaj MA, Tutaj M, Urbano JM, Van Auken K, Van Slyke CE, Wang Q, Wang SJ, Weng S, Westerfield M, Williams G, Wilming LG, Wong ED, Wright A, Yook K, Zarowiecki M, Zhou P, Zytkovicz M. Harmonizing model organism data in the Alliance of Genome Resources. Genetics 2022; 220:iyac022. [PMID: 35380658 PMCID: PMC8982023 DOI: 10.1093/genetics/iyac022] [Show More Authors] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/26/2022] [Indexed: 02/06/2023] Open
Abstract
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
Collapse
|