1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Ponzetti M, Rucci N, Falone S. RNA methylation and cellular response to oxidative stress-promoting anticancer agents. Cell Cycle 2023; 22:870-905. [PMID: 36648057 PMCID: PMC10054233 DOI: 10.1080/15384101.2023.2165632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 01/03/2023] [Indexed: 01/18/2023] Open
Abstract
Disruption of the complex network that regulates redox homeostasis often underlies resistant phenotypes, which hinder effective and long-lasting cancer eradication. In addition, the RNA methylome-dependent control of gene expression also critically affects traits of cellular resistance to anti-cancer agents. However, few investigations aimed at establishing whether the epitranscriptome-directed adaptations underlying acquired and/or innate resistance traits in cancer could be implemented through the involvement of redox-dependent or -responsive signaling pathways. This is unexpected mainly because: i) the effectiveness of many anti-cancer approaches relies on their capacity to promote oxidative stress (OS); ii) altered redox milieu and reprogramming of mitochondrial function have been acknowledged as critical mediators of the RNA methylome-mediated response to OS. Here we summarize the current state of understanding on this topic, as well as we offer new perspectives that might lead to original approaches and strategies to delay or prevent the problem of refractory cancer and tumor recurrence.
Collapse
Affiliation(s)
- Marco Ponzetti
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L'Aquila, Italy
| | - Nadia Rucci
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L'Aquila, Italy
| | - Stefano Falone
- Department of Life, Health and Environmental Sciences, University of L’Aquila, L’Aquila, Italy
| |
Collapse
|
3
|
Fu Y, Yang Q, Yang H, Zhang X. New progress in the role of microRNAs in the diagnosis and prognosis of triple negative breast cancer. Front Mol Biosci 2023; 10:1162463. [PMID: 37122564 PMCID: PMC10134903 DOI: 10.3389/fmolb.2023.1162463] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 03/30/2023] [Indexed: 05/02/2023] Open
Abstract
Triple negative breast cancer is distinguished by its high malignancy, aggressive invasion, rapid progression, easy recurrence, and distant metastases. Additionally, it has a poor prognosis, a high mortality, and is unresponsive to conventional endocrine and targeted therapy, making it a challenging problem for breast cancer treatment and a hotspot for scientific research. Recent research has revealed that certain miRNA can directly or indirectly affect the occurrence, progress and recurrence of TNBC. Their expression levels have a significant impact on TNBC diagnosis, treatment and prognosis. Some miRNAs can serve as biomarkers for TNBC diagnosis and prognosis. This article summarizes the progress of miRNA research in TNBC, discusses their roles in the occurrence, invasion, metastasis, prognosis, and chemotherapy of TNBC, and proposes a treatment strategy for TNBC by interfering with miRNA expression levels.
Collapse
Affiliation(s)
- Yeqin Fu
- Department of Breast Surgery, Zhejiang Cancer Hospital, Hangzhou, Zhejiang, China
- Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Qiuhui Yang
- Department of Breast Surgery, Zhejiang Cancer Hospital, Hangzhou, Zhejiang, China
- Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Hongjian Yang
- Department of Breast Surgery, Zhejiang Cancer Hospital, Hangzhou, Zhejiang, China
- *Correspondence: Hongjian Yang, ; Xiping Zhang,
| | - Xiping Zhang
- Department of Breast Surgery, Zhejiang Cancer Hospital, Hangzhou, Zhejiang, China
- *Correspondence: Hongjian Yang, ; Xiping Zhang,
| |
Collapse
|
4
|
Wang Y, Xue H, Aglave M, Lainé A, Gallopin M, Gautheret D. The contribution of uncharted RNA sequences to tumor identity in lung adenocarcinoma. NAR Cancer 2022; 4:zcac001. [PMID: 35118386 PMCID: PMC8807116 DOI: 10.1093/narcan/zcac001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 11/18/2021] [Accepted: 01/10/2022] [Indexed: 11/12/2022] Open
Abstract
The identity of cancer cells is defined by the interplay between genetic, epigenetic transcriptional and post-transcriptional variation. A lot of this variation is present in RNA-seq data and can be captured at once using reference-free, k-mer analysis. An important issue with k-mer analysis, however, is the difficulty of distinguishing signal from noise. Here, we use two independent lung adenocarcinoma datasets to identify all reproducible events at the k-mer level, in a tumor versus normal setting. We find reproducible events in many different locations (introns, intergenic, repeats) and forms (spliced, polyadenylated, chimeric etc.). We systematically analyze events that are ignored in conventional transcriptomics and assess their value as biomarkers and for tumor classification, survival prediction, neoantigen prediction and correlation with the immune microenvironment. We find that unannotated lincRNAs, novel splice variants, endogenous HERV, Line1 and Alu repeats and bacterial RNAs each contribute to different, important aspects of tumor identity. We argue that differential RNA-seq analysis of tumor/normal sample collections would benefit from this type k-mer analysis to cast a wider net on important cancer-related events. The code is available at https://github.com/Transipedia/dekupl-lung-cancer-inter-cohort.
Collapse
Affiliation(s)
- Yunfeng Wang
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Annoroad Gene Technology Co., Ltd, 100176 Beijing, China
| | - Haoliang Xue
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Marine Aglave
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Antoine Lainé
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| |
Collapse
|
5
|
Lidströmer N, Davids J, Sood HS, Ashrafian H. AIM in Primary Healthcare. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Torkamanian-Afshar M, Nematzadeh S, Tabarzad M, Najafi A, Lanjanian H, Masoudi-Nejad A. In silico design of novel aptamers utilizing a hybrid method of machine learning and genetic algorithm. Mol Divers 2021; 25:1395-1407. [PMID: 33554306 DOI: 10.1007/s11030-021-10192-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 01/28/2021] [Indexed: 12/29/2022]
Abstract
Aptamers can be regarded as efficient substitutes for monoclonal antibodies in many diagnostic and therapeutic applications. Due to the tedious and prohibitive nature of SELEX (systematic evolution of ligands by exponential enrichment), the in silico methods have been developed to improve the enrichment processes rate. However, the majority of these methods did not show any effort in designing novel aptamers. Moreover, some target proteins may have not any binding RNA candidates in nature and a reductive mechanism is needed to generate novel aptamer pools among enormous possible combinations of nucleotide acids to be examined in vitro. We have applied a genetic algorithm (GA) with an embedded binding predictor fitness function to in silico design of RNA aptamers. As a case study of this research, all steps were accomplished to generate an aptamer pool against aminopeptidase N (CD13) biomarker. First, the model was developed based on sequential and structural features of known RNA-protein complexes. Then, utilizing RNA sequences involved in complexes with positive prediction results, as the first-generation, novel aptamers were designed and top-ranked sequences were selected. A 76-mer aptamer was identified with the highest fitness value with a 3 to 6 time higher score than parent oligonucleotides. The reliability of obtained sequences was confirmed utilizing docking and molecular dynamic simulation. The proposed method provides an important simplified contribution to the oligonucleotide-aptamer design process. Also, it can be an underlying ground to design novel aptamers against a wide range of biomarkers.
Collapse
Affiliation(s)
- Mahsa Torkamanian-Afshar
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
- Department of Computer Technologies, Beykent University, Istanbul, Turkey
| | - Sajjad Nematzadeh
- Department of Computer Technologies, Beykent University, Istanbul, Turkey
| | - Maryam Tabarzad
- Protein Technology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Tehran, Iran
| | - Hossein Lanjanian
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Masoudi-Nejad
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran.
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
7
|
Mondal PK, Saha US, Mukhopadhyay I. PseudoGA: cell pseudotime reconstruction based on genetic algorithm. Nucleic Acids Res 2021; 49:7909-7924. [PMID: 34244782 PMCID: PMC8661435 DOI: 10.1093/nar/gkab457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 05/03/2021] [Accepted: 07/07/2021] [Indexed: 01/05/2023] Open
Abstract
Dynamic regulation of gene expression is often governed by progression through transient cell states. Bulk RNA-seq analysis can only detect average change in expression levels and is unable to identify this dynamics. Single cell RNA-seq presents an unprecedented opportunity that helps in placing the cells on a hypothetical time trajectory that reflects gradual transition of their transcriptomes. This continuum trajectory or ‘pseudotime’, may reveal the developmental pathway and provide us with information on dynamic transcriptomic changes and other biological processes. Existing approaches to build pseudotime heavily depend on reducing huge dimension to extremely low dimensional subspaces and may lead to loss of information. We propose PseudoGA, a genetic algorithm based approach to order cells assuming that gene expressions vary according to a smooth curve along the pseudotime trajectory. We observe superior accuracy of our method in simulated as well as benchmarking real datasets. Generality of the assumption behind PseudoGA and no dependence on dimensionality reduction technique make it a robust choice for pseudotime estimation from single cell transcriptome data. PseudoGA is also time efficient when applied to a large single cell RNA-seq data and adaptable to parallel computing. R code for PseudoGA is freely available at https://github.com/indranillab/pseudoga.
Collapse
Affiliation(s)
- Pronoy Kanti Mondal
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Udit Surya Saha
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Indranil Mukhopadhyay
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| |
Collapse
|
8
|
Nguyen HTN, Xue H, Firlej V, Ponty Y, Gallopin M, Gautheret D. Reference-free transcriptome signatures for prostate cancer prognosis. BMC Cancer 2021; 21:394. [PMID: 33845808 PMCID: PMC8040209 DOI: 10.1186/s12885-021-08021-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data. METHODS In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset. RESULTS We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures. CONCLUSIONS Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.
Collapse
Affiliation(s)
- Ha T N Nguyen
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Haoliang Xue
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Virginie Firlej
- Institute of Biology, Université Paris Est Creteil, Creteil, Creteil, France
| | - Yann Ponty
- LIX CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Melina Gallopin
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France.
| |
Collapse
|
9
|
Lorenzi C, Barriere S, Villemin JP, Dejardin Bretones L, Mancheron A, Ritchie W. iMOKA: k-mer based software to analyze large collections of sequencing data. Genome Biol 2020; 21:261. [PMID: 33050927 PMCID: PMC7552494 DOI: 10.1186/s13059-020-02165-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/10/2020] [Indexed: 01/24/2023] Open
Abstract
iMOKA (interactive multi-objective k-mer analysis) is a software that enables comprehensive analysis of sequencing data from large cohorts to generate robust classification models or explore specific genetic elements associated with disease etiology. iMOKA uses a fast and accurate feature reduction step that combines a Naïve Bayes classifier augmented by an adaptive entropy filter and a graph-based filter to rapidly reduce the search space. By using a flexible file format and distributed indexing, iMOKA can easily integrate data from multiple experiments and also reduces disk space requirements and identifies changes in transcript levels and single nucleotide variants. iMOKA is available at https://github.com/RitchieLabIGH/iMOKA and Zenodo https://doi.org/10.5281/zenodo.4008947 .
Collapse
Affiliation(s)
- Claudio Lorenzi
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - Sylvain Barriere
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - Jean-Philippe Villemin
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | | | | | - William Ritchie
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France.
| |
Collapse
|
10
|
Chapman J, Ng YS, Nicholls TJ. The Maintenance of Mitochondrial DNA Integrity and Dynamics by Mitochondrial Membranes. Life (Basel) 2020; 10:life10090164. [PMID: 32858900 PMCID: PMC7555930 DOI: 10.3390/life10090164] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/20/2020] [Accepted: 08/23/2020] [Indexed: 12/18/2022] Open
Abstract
Mitochondria are complex organelles that harbour their own genome. Mitochondrial DNA (mtDNA) exists in the form of a circular double-stranded DNA molecule that must be replicated, segregated and distributed around the mitochondrial network. Human cells typically possess between a few hundred and several thousand copies of the mitochondrial genome, located within the mitochondrial matrix in close association with the cristae ultrastructure. The organisation of mtDNA around the mitochondrial network requires mitochondria to be dynamic and undergo both fission and fusion events in coordination with the modulation of cristae architecture. The dysregulation of these processes has profound effects upon mtDNA replication, manifesting as a loss of mtDNA integrity and copy number, and upon the subsequent distribution of mtDNA around the mitochondrial network. Mutations within genes involved in mitochondrial dynamics or cristae modulation cause a wide range of neurological disorders frequently associated with defects in mtDNA maintenance. This review aims to provide an understanding of the biological mechanisms that link mitochondrial dynamics and mtDNA integrity, as well as examine the interplay that occurs between mtDNA, mitochondrial dynamics and cristae structure.
Collapse
Affiliation(s)
- James Chapman
- Wellcome Centre for Mitochondrial Research, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK;
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Correspondence: (J.C.); (T.J.N.)
| | - Yi Shiau Ng
- Wellcome Centre for Mitochondrial Research, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK;
- Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Thomas J. Nicholls
- Wellcome Centre for Mitochondrial Research, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK;
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Correspondence: (J.C.); (T.J.N.)
| |
Collapse
|