1
|
Shave S, Isaksson R, Pham NT, Elliott RJR, Dawson JC, Soudant J, Carragher NO, Auer M. Cellular Activity of CQWW Nullomer-Derived Peptides. ACS OMEGA 2025; 10:6794-6800. [PMID: 40028100 PMCID: PMC11865978 DOI: 10.1021/acsomega.4c08860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 01/07/2025] [Accepted: 01/23/2025] [Indexed: 03/05/2025]
Abstract
Analysis of observed protein sequences across all species within the UniProtKB/Swiss-Prot data set reveals CQWW as the shortest absent stretch of amino acids. While DNA can be found encoding the CQWW sequence, it has never been observed to be translated or included in manually curated sets of proteins, existing only in predicted, tentative sequences and in a single mature antibody sequence. We have synthesized this "nullomer" peptide, along with 13 derivatives, reversed, truncated, stereoisomers, and alanine-scanning peptides, conjugated to polyarginine stretches to increase cellular uptake. We observed their impact against a healthy neuronal line and six patient-derived glioblastoma cell lines spanning three clinical subtypes. Results reveal IC50 values averaging 4.9 μM for inhibition of cell survival across tested oncogenic cell lines. High-content phenotypic analysis of cellular features and reverse-phase protein arrays failed to discern a clear mode of action for the nullomer peptide but suggests mitochondrial impairment through the inhibition of GSK3 and isoforms, supported by observations of reduced mitochondrial stain intensities. With a recent increase in interest in nullomer peptides, we see the results in this study as a starting point for further investigation into this potentially therapeutic peptide class.
Collapse
Affiliation(s)
- Steven Shave
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
- School
of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, U.K.
| | - Rebecka Isaksson
- School
of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, U.K.
- Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, U.K.
| | - Nhan T. Pham
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
- School
of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, U.K.
- College
of Medicine and Veterinary Medicine, University
of Edinburgh, Institute for Regeneration and Repair, 4-5 Little France Drive, Edinburgh EH16 4UU, U.K.
| | - Richard J. R. Elliott
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
| | - John C. Dawson
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
| | - Julius Soudant
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
- Departamento
de Farmacologia, Facultad de Medicina, Universidad
Autónoma de Madrid, Calle Arzobispo Morcillo 4, Madrid 28029, Spain
| | - Neil O. Carragher
- Edinburgh
Cancer Research, Cancer Research UK Scotland Centre, Institute of
Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh EH4 2XR, U.K.
| | - Manfred Auer
- School
of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, U.K.
| |
Collapse
|
2
|
Bochalis E, Patsakis M, Chantzi N, Mouratidis I, Chartoumpekis D, Georgakopoulos-Soares I. Unraveling diversity by isolating peptide sequences specific to distinct taxonomic groups. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.05.636664. [PMID: 39975352 PMCID: PMC11839104 DOI: 10.1101/2025.02.05.636664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development and can have widespread applications in pathogen diagnostics, human healthcare, ecology and the characterization of biomes. Here, we investigated the existence of peptide k-mer sequences that are exclusively present in a specific taxonomy and absent in every other taxonomic level, termed taxonomic quasi-primes. By analyzing proteomes across 24,073 species, we identified quasi-prime peptides specific to superkingdoms, kingdoms, and phyla, uncovering their taxonomic distributions and functional relevance. These peptides exhibit remarkable sequence uniqueness at six- and seven-amino-acid lengths, offering insights into evolutionary divergence and lineage-specific adaptations. Moreover, we show that human quasi-prime loci are more prone to harboring pathogenic variants, underscoring their functional significance. This study introduces taxonomic quasi-primes and offers insights into their contributions to proteomic diversity, evolutionary pathways, and functional adaptations across the tree of life, while emphasizing their potential impact on human health and disease.
Collapse
Affiliation(s)
- Eleftherios Bochalis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Internal Medicine, Division of Endocrinology, Medical School, University of Patras, Patras, Greece
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Dionysios Chartoumpekis
- Department of Internal Medicine, Division of Endocrinology, Medical School, University of Patras, Patras, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
3
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
4
|
Chantzi N, Mareboina M, Konnaris MA, Montgomery A, Patsakis M, Mouratidis I, Georgakopoulos-Soares I. The determinants of the rarity of nucleic and peptide short sequences in nature. NAR Genom Bioinform 2024; 6:lqae029. [PMID: 38584871 PMCID: PMC10993293 DOI: 10.1093/nargab/lqae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024] Open
Abstract
The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. We examined 45 785 reference genomes and 21 871 reference proteomes, spanning archaea, bacteria, eukaryotes and viruses to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the rarity index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences across nature or within each domain of life and viruses separately. When examining each of the three domains of life and viruses separately, the R² performance of the model predicting rarity for 5-mer peptides from mono- and dipeptides ranged between 0.814 and 0.932. A separate model predicting rarity for 10-mer oligonucleotides from mono- and dinucleotides achieved R² performance between 0.408 and 0.606. Our results indicate that the mono- and dinucleotide composition of nucleic sequences and the mono- and dipeptide composition of peptide sequences can explain a significant proportion of the variance in their frequencies in nature.
Collapse
Affiliation(s)
- Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Maxwell A Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Department of Statistics, Penn State University, University Park, PA, 16802, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| |
Collapse
|
5
|
Montgomery A, Tsiatsianis GC, Mouratidis I, Chan CSY, Athanasiou M, Papanastasiou AD, Kantere V, Syrigos N, Vathiotis I, Syrigos K, Yee NS, Georgakopoulos-Soares I. Utilizing nullomers in cell-free RNA for early cancer detection. Cancer Gene Ther 2024; 31:861-870. [PMID: 38351138 PMCID: PMC11192629 DOI: 10.1038/s41417-024-00741-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/25/2024] [Accepted: 01/26/2024] [Indexed: 06/23/2024]
Abstract
Early detection of cancer can significantly improve patient outcomes; however, sensitive and highly specific biomarkers for cancer detection are currently missing. Nullomers are the shortest sequences that are absent from the human genome but can emerge due to somatic mutations in cancer. We examine over 10,000 whole exome sequencing matched tumor-normal samples to characterize nullomer emergence across exonic regions of the genome. We also identify nullomer emerging mutational hotspots within tumor genes. Finally, we provide evidence for the identification of nullomers in cell-free RNA from peripheral blood samples, enabling detection of multiple tumor types. We show multiple tumor classification models with an AUC greater than 0.9, including a hepatocellular carcinoma classifier with an AUC greater than 0.99.
Collapse
Affiliation(s)
- Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Georgios Christos Tsiatsianis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S Y Chan
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Maria Athanasiou
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | | | - Verena Kantere
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Nikos Syrigos
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Ioannis Vathiotis
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Konstantinos Syrigos
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Nelson S Yee
- Next Generation Therapies Program, Penn State Cancer Institute; Division of Hematology-Oncology, Department of Medicine, Penn State Health Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| |
Collapse
|