1
|
Sahu VK, Sur S, Agarwal S, Madhyastha H, Ranjan A, Basu S. Unveiling theranostic potential: Insights into cell-free microRNA-protein interactions. Biophys Chem 2025; 322:107421. [PMID: 40048894 DOI: 10.1016/j.bpc.2025.107421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 09/02/2024] [Accepted: 03/01/2025] [Indexed: 04/27/2025]
Abstract
MicroRNAs (miRNAs) belong to a short endogenous class of non-coding RNAs which have been well studied for their crucial role in regulating cellular homeostasis. Their role in the modulation of diverse biological pathways by interacting with cellular proteins, genes, and RNAs through cellular communication projects them as promising biomarkers and therapeutic targets. However, studying miRNA-protein interactions specific to disease in the extracellular or cell-free environments for drug discovery and biomarker establishment is challenging and resource-intensive due to their structural complexities. In this study, we present a computational approach to uncover patterns in miRNA-protein interactions in the cell-free milieu leveraging existing experimental data. We employed motif discovery tools, extracted motifs from 3D protein and miRNA structures, and conducted molecular docking analyses to identify and rank these interactions. This in silico-based approach reveals 204 and 2874 consensus sequences in miRNAs and proteins, respectively, within the interactome highlighting their potential roles in the cardiovascular diseases, neurological disorders, and cancers. The role of proteins like METTL3 and AGO2 and miRNAs such as hsa-miR-484 and hsa-miR-30 families, hsa-mir-126-5p has been discussed contextually. Additionally, we discovered simple sequence repeats in the consensus patterns having unexplored functional roles. Our observations provide new insights into the extracellular miRNA-protein interactions that may drive disease initiation and progression offering potential avenues for overcoming challenges like therapy relapse and drug inefficacy. The results of our analysis are available in the miRPin database (https://www.mirna.in/miRPin).
Collapse
Affiliation(s)
- Vishal Kumar Sahu
- Cancer and Translational Research Centre, Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Dr. D. Y. Patil Vidyapeeth, Tathawade, Pune 411033, India
| | - Subhayan Sur
- Cancer and Translational Research Centre, Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Dr. D. Y. Patil Vidyapeeth, Tathawade, Pune 411033, India
| | - Sanjana Agarwal
- Cancer and Translational Research Centre, Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Dr. D. Y. Patil Vidyapeeth, Tathawade, Pune 411033, India
| | - Harishkumar Madhyastha
- Department of Cardiovascular Physiology, Faculty of Medicine, University of Miyazaki, Miyazaki 8891692, Japan
| | - Amit Ranjan
- Cancer and Translational Research Centre, Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Dr. D. Y. Patil Vidyapeeth, Tathawade, Pune 411033, India.
| | - Soumya Basu
- Cancer and Translational Research Centre, Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Dr. D. Y. Patil Vidyapeeth, Tathawade, Pune 411033, India.
| |
Collapse
|
2
|
Górna MW, Merski M. Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots. Methods Mol Biol 2025; 2870:95-116. [PMID: 39543033 DOI: 10.1007/978-1-0716-4213-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Proteins that contain sequence repetitions and low complexity regions can be analyzed using self-homology dot plot analysis. Dot plots can readily identify protein sequence repeats; the number of repeats and their length and location within the protein sequence are readily identifiable from the dot plots without the need to pre-define any of these attributes, making this method largely model-independent. We discuss the criteria for statistical identification of protein repeats and recommend simple ways of identifying protein repeats. While higher levels of sequence conservation within the repeats do make them easier to formally identify, this method can identify protein repeats with fairly low levels of conservation, as well as notably non-tandem repetitions with sizeable sections of complex, non-repeat sequence separating the individual repeat instances. Furthermore, even simple visual examination of these dot plots can discover conserved patterns within families of closely related proteins, and the level of this conservation can be readily quantified using a Jaccard index. Exhaustive pairwise comparisons can be assembled using hierarchical clustering methods to get a picture of the conserved repeat architectures within families of repeat proteins.
Collapse
Affiliation(s)
- Maria W Górna
- Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Matthew Merski
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| |
Collapse
|
3
|
Felício D, Martins S, Alves GP, Amorim A, Macedo‐Ribeiro S, Merski M. Evolutionary model of repeat insertions in Ataxin-3 traces the origin of the polyglutamine stretch to an ancestral ubiquitin binding module. Protein Sci 2024; 33:e5236. [PMID: 39589068 PMCID: PMC11590126 DOI: 10.1002/pro.5236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 11/08/2024] [Accepted: 11/09/2024] [Indexed: 11/27/2024]
Abstract
The human ataxin-3 protein contains an N-terminal Josephin domain, composed of a papain-like cysteine protease with a helical hairpin insertion, and a C-terminal region with two or three ubiquitin interacting motifs and a polyglutamine tract. Expansion of the polyglutamine tract leading to protein aggregation and neuronal degradation has been linked to Machado-Joseph disease/spinocerebellar ataxia type 3, the most common form of dominantly inherited ataxia. In this study, we performed sequence self-homology dot plot analysis and compared orthologous proteins to analyze the architecture of ataxin-3 during the evolution of Filozoa. This analysis uncovered up to three additional repetitions of the ubiquitin binding motif in ataxin-3, including the helical hairpin insertion in the Josephin domain, and revealed a highly conserved multimodular architecture that is broadly preserved throughout the Filozoa. Overall, a set of 78 putative ubiquitin binding repeats from 18 exemplar proteins were identified. Apparent neofunctionalization events could also be recognized, including modification of repeat 5 which gave rise to the disease-linked polyglutamine tract, just before the Sarcopterygian divergence. This model provides a unifying principle for the ataxin-3 protein architecture and can potentially provide new insights into the role of molecular interactions in ataxin-3 function and Machado-Joseph disease/spinocerebellar ataxia type 3 disease mechanisms.
Collapse
Affiliation(s)
- Daniela Felício
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IPATIMUP‐Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
- ICBAS‐Instituto Ciências Biomédicas Abel SalazarUniversity of PortoPortoPortugal
| | - Sandra Martins
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IPATIMUP‐Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
| | | | - António Amorim
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IPATIMUP‐Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
- Faculty of SciencesUniversity of PortoPortoPortugal
| | - Sandra Macedo‐Ribeiro
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IBMC‐Instituto de Biologia Molecular e CelularUniversidade do PortoPortoPortugal
| | - Matthew Merski
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IBMC‐Instituto de Biologia Molecular e CelularUniversidade do PortoPortoPortugal
| |
Collapse
|
4
|
Arrías PN, Osmanli Z, Peralta E, Chinestrad PM, Monzon AM, Tosatto SCE. Diversity and structural-functional insights of alpha-solenoid proteins. Protein Sci 2024; 33:e5189. [PMID: 39465903 PMCID: PMC11514114 DOI: 10.1002/pro.5189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 09/25/2024] [Accepted: 09/29/2024] [Indexed: 10/29/2024]
Abstract
Alpha-solenoids are a significant and diverse subset of structured tandem repeat proteins (STRPs) that are important in various domains of life. This review examines their structural and functional diversity and highlights their role in critical cellular processes such as signaling, apoptosis, and transcriptional regulation. Alpha-solenoids can be classified into three geometric folds: low curvature, high curvature, and corkscrew, as well as eight subfolds: ankyrin repeats; Huntingtin, elongation factor 3, protein phosphatase 2A, and target of rapamycin; armadillo repeats; tetratricopeptide repeats; pentatricopeptide repeats; Pumilio repeats; transcription activator-like; and Sel-1 and Sel-1-like repeats. These subfolds represent distinct protein families with unique structural properties and functions, highlighting the versatility of alpha-solenoids. The review also discusses their association with disease, highlighting their potential as therapeutic targets and their role in protein design. Advances in state-of-the-art structure prediction methods provide new opportunities and challenges in the functional characterization and classification of this kind of fold, emphasizing the need for continued development of methods for their identification and proper data curation and deposition in the main databases.
Collapse
Affiliation(s)
- Paula Nazarena Arrías
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
- Department of Protein ScienceKTH Royal Institute of TechnologyStockholmSweden
| | - Zarifa Osmanli
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
| | - Estefanía Peralta
- Laboratorio de Investigación y Desarrollo de Bioactivos (LIDeB), Departamento de Ciencias Biológicas, Facultad de Ciencias ExactasUniversidad Nacional de La PlataLa PlataBuenos AiresArgentina
| | | | | | - Silvio C. E. Tosatto
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
- Institute of Biomembranes, Bioenergetics and Molecular BiotechnologiesNational Research Council (CNR‐IBIOM)BariItaly
| |
Collapse
|
5
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
6
|
Rajathei DM, Parthasarathy S, Selvaraj S. Identification and Analysis of Long Repeats of Proteins at the Domain Level. Front Bioeng Biotechnol 2019; 7:250. [PMID: 31649924 PMCID: PMC6795024 DOI: 10.3389/fbioe.2019.00250] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 09/16/2019] [Indexed: 12/27/2022] Open
Abstract
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.
Collapse
Affiliation(s)
- David Mary Rajathei
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Subbiah Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| |
Collapse
|
7
|
Turjanski P, Ferreiro DU. On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences. J Phys Chem B 2018; 122:11295-11301. [PMID: 30239207 DOI: 10.1021/acs.jpcb.8b07206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences.
Collapse
Affiliation(s)
- Pablo Turjanski
- KAPOW, Departamento de Computación , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-ICC , Buenos Aires , Argentina
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN , Buenos Aires , Argentina
| |
Collapse
|
8
|
Inferring repeat-protein energetics from evolutionary information. PLoS Comput Biol 2017; 13:e1005584. [PMID: 28617812 PMCID: PMC5491312 DOI: 10.1371/journal.pcbi.1005584] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 06/29/2017] [Accepted: 05/21/2017] [Indexed: 11/19/2022] Open
Abstract
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
Collapse
|
9
|
Paladin L, Hirsh L, Piovesan D, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures. Nucleic Acids Res 2016; 45:D308-D312. [PMID: 27899671 PMCID: PMC5210593 DOI: 10.1093/nar/gkw1136] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/20/2016] [Accepted: 10/31/2016] [Indexed: 12/19/2022] Open
Abstract
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy.,Departamento de Ingeniería, Pontificia Universidad Católica del Perú, 32 Lima, Perú
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Miguel A Andrade-Navarro
- Institute of Molecular Biology, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier, 34293 Montpellier, France.,Institut de Biologie Computationnelle (IBC), 34293 Montpellier, France.,Institute of Bioengineering, University ITMO, 197101 St. Petersburg, Russia
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy .,CNR Institute of Neuroscience, 35121 Padova, Italy
| |
Collapse
|