1
|
Hernández Berthet AS, Aptekmann AA, Tejero J, Sánchez IE, Noguera ME, Roman EA. Associating protein sequence positions with the modulation of quantitative phenotypes. Arch Biochem Biophys 2024; 755:109979. [PMID: 38583654 DOI: 10.1016/j.abb.2024.109979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/11/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Although protein sequences encode the information for folding and function, understanding their link is not an easy task. Unluckily, the prediction of how specific amino acids contribute to these features is still considerably impaired. Here, we developed a simple algorithm that finds positions in a protein sequence with potential to modulate the studied quantitative phenotypes. From a few hundred protein sequences, we perform multiple sequence alignments, obtain the per-position pairwise differences for both the sequence and the observed phenotypes, and calculate the correlation between these last two quantities. We tested our methodology with four cases: archaeal Adenylate Kinases and the organisms optimal growth temperatures, microbial rhodopsins and their maximal absorption wavelengths, mammalian myoglobins and their muscular concentration, and inhibition of HIV protease clinical isolates by two different molecules. We found from 3 to 10 positions tightly associated with those phenotypes, depending on the studied case. We showed that these correlations appear using individual positions but an improvement is achieved when the most correlated positions are jointly analyzed. Noteworthy, we performed phenotype predictions using a simple linear model that links per-position divergences and differences in the observed phenotypes. Predictions are comparable to the state-of-art methodologies which, in most of the cases, are far more complex. All of the calculations are obtained at a very low information cost since the only input needed is a multiple sequence alignment of protein sequences with their associated quantitative phenotypes. The diversity of the explored systems makes our work a valuable tool to find sequence determinants of biological activity modulation and to predict various functional features for uncharacterized members of a protein family.
Collapse
Affiliation(s)
- Ayelén S Hernández Berthet
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina.
| | - Ariel A Aptekmann
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA; Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
| | - Jesús Tejero
- Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15260, USA; Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Ignacio E Sánchez
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina.
| | - Martín E Noguera
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina.
| | - Ernesto A Roman
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina.
| |
Collapse
|
2
|
Page BM, Martin TA, Wright CL, Fenton LA, Villar MT, Tang Q, Artigues A, Lamb A, Fenton AW, Swint‐Kruse L. Odd one out? Functional tuning of Zymomonas mobilis pyruvate kinase is narrower than its allosteric, human counterpart. Protein Sci 2022; 31:e4336. [PMID: 35762709 PMCID: PMC9202079 DOI: 10.1002/pro.4336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/29/2022] [Accepted: 05/03/2022] [Indexed: 11/08/2022]
Abstract
Various protein properties are often illuminated using sequence comparisons of protein homologs. For example, in analyses of the pyruvate kinase multiple sequence alignment, the set of positions that changed during speciation ("phylogenetic" positions) were enriched for "rheostat" positions in human liver pyruvate kinase (hLPYK). (Rheostat positions are those which, when substituted with various amino acids, yield a range of functional outcomes). However, the correlation was moderate, which could result from multiple biophysical constraints acting on the same position during evolution and/or various sources of noise. To further examine this correlation, we here tested Zymomonas mobilis PYK (ZmPYK), which has <65% sequence identity to any other PYK sequence. Twenty-six ZmPYK positions were selected based on their phylogenetic scores, substituted with multiple amino acids, and assessed for changes in Kapp-PEP . Although we expected to identify multiple, strong rheostat positions, only one moderate rheostat position was detected. Instead, nearly half of the 271 ZmPYK variants were inactive and most others showed near wild-type function. Indeed, for the active ZmPYK variants, the total range of Kapp,PEP values ("tunability") was 40-fold less than that observed for hLPYK variants. The combined functional studies and sequence comparisons suggest that ZmPYK has evolved functional and/or structural attributes that differ from the rest of the family. We hypothesize that including such "orphan" sequences in MSA analyses obscures the correlations used to predict rheostat positions. Finally, results raise the intriguing biophysical question as to how the same protein fold can support rheostat positions in one homolog but not another.
Collapse
Affiliation(s)
- Braelyn M. Page
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Tyler A. Martin
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Collette L. Wright
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
- Department of Molecular BiosciencesThe University of KansasLawrenceKansasUSA
| | - Lauren A. Fenton
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Maite T. Villar
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Qingling Tang
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Antonio Artigues
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Audrey Lamb
- Department of Molecular BiosciencesThe University of KansasLawrenceKansasUSA
- Department of ChemistryUniversity of Texas at San AntonioSan AntonioTexasUSA
| | - Aron W. Fenton
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Liskin Swint‐Kruse
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| |
Collapse
|
3
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
4
|
Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021; 8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Specificity Determining Positions (SDPs) are protein sites responsible for functional specificity within a family of homologous proteins. These positions are extracted from a family’s multiple sequence alignment and complement the fully conserved positions as predictors of functional sites. SDP analysis is now routinely used for locating these specificity-related sites in families of proteins of biomedical or biotechnological interest with the aim of mutating them to switch specificities or design new ones. There are many different approaches for detecting these positions in multiple sequence alignments. Nevertheless, existing methods report the potential SDP positions but they do not provide any clue on the physicochemical basis behind the functional specificity, which has to be inferred a-posteriori by manually inspecting these positions in the alignment. In this work, a new methodology is presented that, concomitantly with the detection of the SDPs, automatically provides information on the amino-acid physicochemical properties more related to the change in specificity. This new method is applied to two different multiple sequence alignments of homologous of the well-studied RasH protein representing different cases of functional specificity and the results discussed in detail.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), c/Darwin, 3, 28049 Madrid, Spain
| |
Collapse
|
5
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|