1
|
Voortman‐Sheetz K, Wrabl JO, Hilser VJ. Impact of local unfolding fluctuations on the evolution of regional sequence preferences in proteins. Protein Sci 2025; 34:e70015. [PMID: 39969063 PMCID: PMC11837041 DOI: 10.1002/pro.70015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/07/2024] [Accepted: 12/13/2024] [Indexed: 02/20/2025]
Abstract
The number of distinct structural environments in the proteome (as observed in the Protein Data Bank) may belie an organizing framework, whereby evolution conserves the relative stability of different sequence segments, regardless of the specific structural details present in the final fold. If true, the question arises as to whether the energetic consequences of amino acid substitutions, and thus the frequencies of amino acids within each of these so-called thermodynamic environments, could depend less on what local structure that sequence segment may adopt in the final fold, and more on the local stability of that final structure relative to the unfolded state. To address this question, a previously described ensemble-based approach (the COREX algorithm) was used to define proteins in terms of thermodynamic environments, and the naturally occurring frequencies of amino acids within these environments were used to generate statistical energies (a type of knowledge-based potential). By comparing compatibility scores from the statistical energies with energies calculated using the Rosetta all-atom energy function, we assessed the information overlap between the two approaches. Results revealed a substantial correlation between the statistical scores and those obtained using Rosetta, directly demonstrating that a small number of thermodynamic environments are sufficient to capture the perceived multiplicity of different structural environments in proteins. More importantly, the agreement suggests that regional amino acid distributions within each protein in any proteome have been substantially driven by the evolutionary conservation of the regional differences in stabilities within protein families.
Collapse
Affiliation(s)
- Keila Voortman‐Sheetz
- Department of BiologyJohns Hopkins UniversityBaltimoreMarylandUSA
- Chemical Biology Interface Graduate ProgramJohns Hopkins UniversityBaltimoreMarylandUSA
| | - James O. Wrabl
- Department of BiologyJohns Hopkins UniversityBaltimoreMarylandUSA
| | | |
Collapse
|
2
|
Sumanaweera D, Suo C, Cujba AM, Muraro D, Dann E, Polanski K, Steemers AS, Lee W, Oliver AJ, Park JE, Meyer KB, Dumitrascu B, Teichmann SA. Gene-level alignment of single-cell trajectories. Nat Methods 2025; 22:68-81. [PMID: 39300283 PMCID: PMC11725504 DOI: 10.1038/s41592-024-02378-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 07/12/2024] [Indexed: 09/22/2024]
Abstract
Single-cell data analysis can infer dynamic changes in cell populations, for example across time, space or in response to perturbation, thus deriving pseudotime trajectories. Current approaches comparing trajectories often use dynamic programming but are limited by assumptions such as the existence of a definitive match. Here we describe Genes2Genes, a Bayesian information-theoretic dynamic programming framework for aligning single-cell trajectories. It is able to capture sequential matches and mismatches of individual genes between a reference and query trajectory, highlighting distinct clusters of alignment patterns. Across both real world and simulated datasets, it accurately inferred alignments and demonstrated its utility in disease cell-state trajectory analysis. In a proof-of-concept application, Genes2Genes revealed that T cells differentiated in vitro match an immature in vivo state while lacking expression of genes associated with TNF signaling. This demonstrates that precise trajectory alignment can pinpoint divergence from the in vivo system, thus guiding the optimization of in vitro culture conditions.
Collapse
Affiliation(s)
- Dinithi Sumanaweera
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK
| | - Chenqu Suo
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Paediatrics, Cambridge University Hospitals; Hills Road, Cambridge, UK
| | - Ana-Maria Cujba
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Daniele Muraro
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Krzysztof Polanski
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alexander S Steemers
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Princess Máxima Center for Pediatric Oncology, Utrecht, Netherlands
| | - Woochan Lee
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Biomedical Sciences, Seoul National University, Seoul, Korea
| | - Amanda J Oliver
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jong-Eun Park
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Kerstin B Meyer
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Bianca Dumitrascu
- Department of Statistics, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Sarah A Teichmann
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK.
- Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK.
- Department of Medicine, University of Cambridge, Cambridge, UK.
- Co-director of CIFAR Macmillan Research Program, Toronto, Ontario, Canada.
| |
Collapse
|
3
|
Suárez T, Montaño DF, Suárez R. Construction of amino acids reduced alphabets from molecular descriptors for interpretation of N-carbamylase, luciferase and PI3K mutations. Biosystems 2024; 246:105331. [PMID: 39260761 DOI: 10.1016/j.biosystems.2024.105331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 09/04/2024] [Accepted: 09/08/2024] [Indexed: 09/13/2024]
Abstract
The classification of amino acids has proven to be a useful tool for understanding the importance of sequence in protein function. The reduced amino acid alphabets are an example of these classifications, which, when built from physicochemical, structural and quantum characteristics of the amino acids, allow it to simplify the representation of the sequences, being useful in the modelling, design and understanding of proteins. So, an objective selection of amino acids properties is important, due classes formed in a reduced alphabet depend on the descriptors used for classification. In this research, based on a careful selection of descriptors for the 20 amino acids, through techniques such as the information content index and hierarchical cluster analysis with ties in proximity, 20,871,586 reduced amino acid alphabets were constructed. This large collection of reduced alphabets was been used to interpret alterations in the function of three proteins: N-carbamylase, Luciferase, and PI3K, caused by amino acid changes in their sequences. For this, the similar and different descriptors linked to these mutations were studied. Properties such as volume, hydrophobicity, charge and autocorrelation can be associated with variations in the behaviour of these proteins, while the frequency in specific secondary structures, the Gibbs free energy and some topological and quantum properties can be considered as the causes of preventing the deactivation of protein function. This work offers the most complete collection of reduced alphabets that promise to be a useful tool for the interpretation of alterations caused by amino acid mutations in the protein sequence.
Collapse
Affiliation(s)
- Tatiana Suárez
- CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| | - Diego F Montaño
- Departamento de Química, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| | - Rosana Suárez
- CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| |
Collapse
|
4
|
Rossi FPN, Flores VS, Uceda-Campos G, Amgarten DE, Setubal JC, da Silva AM. Comparative Analyses of Bacteriophage Genomes. Methods Mol Biol 2024; 2802:427-453. [PMID: 38819567 DOI: 10.1007/978-1-0716-3838-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Bacterial viruses (bacteriophages or phages) are the most abundant and diverse biological entities on Earth. There is a renewed worldwide interest in phage-centered research motivated by their enormous potential as antimicrobials to cope with multidrug-resistant pathogens. An ever-growing number of complete phage genomes are becoming available, derived either from newly isolated phages (cultivated phages) or recovered from metagenomic sequencing data (uncultivated phages). Robust comparative analysis is crucial for a comprehensive understanding of genotypic variations of phages and their related evolutionary processes, and to investigate the interaction mechanisms between phages and their hosts. In this chapter, we present a protocol for phage comparative genomics employing tools selected out of the many currently available, focusing on complete genomes of phages classified in the class Caudoviricetes. This protocol provides accurate identification of similarities, differences, and patterns among new and previously known complete phage genomes as well as phage clustering and taxonomic classification.
Collapse
Affiliation(s)
| | - Vinicius Sousa Flores
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | - Guillermo Uceda-Campos
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | | | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | - Aline Maria da Silva
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil.
| |
Collapse
|
5
|
Mappin F, Bellantuono AJ, Ebrahimi B, DeGennaro M. Odor-evoked transcriptomics of Aedes aegypti mosquitoes. PLoS One 2023; 18:e0293018. [PMID: 37874813 PMCID: PMC10597520 DOI: 10.1371/journal.pone.0293018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/03/2023] [Indexed: 10/26/2023] Open
Abstract
Modulation of odorant receptors mRNA induced by prolonged odor exposure is highly correlated with ligand-receptor interactions in Drosophila as well as mammals of the Muridae family. If this response feature is conserved in other organisms, this presents an intriguing initial screening tool when searching for novel receptor-ligand interactions in species with predominantly orphan olfactory receptors. We demonstrate that mRNA modulation in response to 1-octen-3-ol odor exposure occurs in a time- and concentration-dependent manner in Aedes aegypti mosquitoes. To investigate gene expression patterns at a global level, we generated an odor-evoked transcriptome associated with 1-octen-3-ol odor exposure. Transcriptomic data revealed that ORs and OBPs were transcriptionally responsive whereas other chemosensory gene families showed little to no differential expression. Alongside chemosensory gene expression changes, transcriptomic analysis found that prolonged exposure to 1-octen-3-ol modulated xenobiotic response genes, primarily members of the cytochrome P450, insect cuticle proteins, and glucuronosyltransferases families. Together, these findings suggest that mRNA transcriptional modulation of olfactory receptors caused by prolonged odor exposure is pervasive across taxa and can be accompanied by the activation of xenobiotic responses.
Collapse
Affiliation(s)
- Fredis Mappin
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Anthony J. Bellantuono
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Babak Ebrahimi
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Matthew DeGennaro
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| |
Collapse
|
6
|
Rajapaksa S, Konagurthu AS, Lesk AM. Sequence and structure alignments in post-AlphaFold era. Curr Opin Struct Biol 2023; 79:102539. [PMID: 36753924 DOI: 10.1016/j.sbi.2023.102539] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/02/2023] [Indexed: 02/09/2023]
Abstract
Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.
Collapse
Affiliation(s)
- Sandun Rajapaksa
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, 16802, Pennsylvania, USA.
| |
Collapse
|
7
|
Mappin F, Bellantuono AJ, Ebrahimi B, DeGennaro M. Odor-evoked transcriptomics of Aedes aegypti mosquitoes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.12.532230. [PMID: 36993705 PMCID: PMC10055012 DOI: 10.1101/2023.03.12.532230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Modulation of odorant receptors mRNA induced by prolonged odor exposure is highly correlated with ligand-receptor interactions in Drosophila as well as mammals of the Muridae family. If this response feature is conserved in other organisms, this presents a potentially potent initial screening tool when searching for novel receptor-ligand interactions in species with predominantly orphan olfactory receptors. We demonstrate that mRNA modulation in response to 1-octen-3-ol odor exposure occurs in a time- and concentration-dependent manner in Aedes aegypti mosquitoes. To investigate gene expression patterns at a global level, we generated an odor-evoked transcriptome associated with 1-octen-3-ol odor exposure. Transcriptomic data revealed that ORs and OBPs were transcriptionally responsive whereas other chemosensory gene families showed little to no differential expression. Alongside chemosensory gene expression changes, transcriptomic analysis found that prolonged exposure to 1-octen-3-ol modulated xenobiotic response genes, primarily members of the cytochrome P450, insect cuticle proteins, and glucuronosyltransferases families. Together, these findings suggest that mRNA transcriptional modulation caused by prolonged odor exposure is pervasive across taxa and accompanied by the activation of xenobiotic responses. Furthermore, odor-evoked transcriptomics create a potential screening tool for filtering and identification of chemosensory and xenobiotic targets of interest.
Collapse
Affiliation(s)
- Fredis Mappin
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA
| | - Anthony J. Bellantuono
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA
| | - Babak Ebrahimi
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA
| | - Matthew DeGennaro
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
8
|
Lesk AM, Konagurthu AS. Protein structure prediction improves the quality of amino‐acid sequence alignment. Proteins 2022; 90:2144-2147. [DOI: 10.1002/prot.26392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 06/01/2022] [Accepted: 06/21/2022] [Indexed: 11/06/2022]
Affiliation(s)
- Arthur M. Lesk
- Department of Biochemistry and Molecular Biology The Pennsylvania State University University Park Pennsylvania USA
| | - Arun S. Konagurthu
- Department of Data Science and Artificial Intelligence Monash University Clayton Victoria Australia
| |
Collapse
|
9
|
Sumanaweera D, Allison L, Konagurthu AS. Bridging the gaps in statistical models of protein alignment. Bioinformatics 2022; 38:i229-i237. [PMID: 35758809 PMCID: PMC9235498 DOI: 10.1093/bioinformatics/btac246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Summary Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together. To overcome this gap, this article demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed using a time-parameterized substitution matrix and a time-parameterized alignment state machine. Methods to derive all parameters of such a model from any benchmark collection of aligned protein sequences are described here. This has not only allowed us to generate a unified statistical model for each of the nine widely used substitution matrices (PAM, JTT, BLOSUM, JO, WAG, VTML, LG, MIQS and PFASUM), but also resulted in a new unified model, MMLSUM. Our underlying methodology measures the Shannon information content using each model to explain losslessly any given collection of alignments, which has allowed us to quantify the performance of all the above models on six comprehensive alignment benchmarks. Our results show that MMLSUM results in a new and clear overall best performance, followed by PFASUM, VTML, BLOSUM and MIQS, respectively, amongst the top five. We further analyze the statistical properties of MMLSUM model and contrast it with others. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dinithi Sumanaweera
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|