1
|
Lee H, Ozbulak U, Park H, Depuydt S, De Neve W, Vankerschaver J. Assessing the reliability of point mutation as data augmentation for deep learning with genomic data. BMC Bioinformatics 2024; 25:170. [PMID: 38689247 PMCID: PMC11059627 DOI: 10.1186/s12859-024-05787-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. RESULTS Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. CONCLUSION Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.
Collapse
Affiliation(s)
| | - Utku Ozbulak
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
| | - Homin Park
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
- IDLab, Department of Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Stephen Depuydt
- Erasmus Brussels University of Applied Sciences and Arts, Brussels, Belgium
| | - Wesley De Neve
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
- IDLab, Department of Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Joris Vankerschaver
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea.
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
| |
Collapse
|
2
|
Lynn N, Tuller T. Detecting and understanding meaningful cancerous mutations based on computational models of mRNA splicing. NPJ Syst Biol Appl 2024; 10:25. [PMID: 38453965 PMCID: PMC10920900 DOI: 10.1038/s41540-024-00351-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/22/2024] [Indexed: 03/09/2024] Open
Abstract
Cancer research has long relied on non-silent mutations. Yet, it has become overwhelmingly clear that silent mutations can affect gene expression and cancer cell fitness. One fundamental mechanism that apparently silent mutations can severely disrupt is alternative splicing. Here we introduce Oncosplice, a tool that scores mutations based on models of proteomes generated using aberrant splicing predictions. Oncosplice leverages a highly accurate neural network that predicts splice sites within arbitrary mRNA sequences, a greedy transcript constructor that considers alternate arrangements of splicing blueprints, and an algorithm that grades the functional divergence between proteins based on evolutionary conservation. By applying this tool to 12M somatic mutations we identify 8K deleterious variants that are significantly depleted within the healthy population; we demonstrate the tool's ability to identify clinically validated pathogenic variants with a positive predictive value of 94%; we show strong enrichment of predicted deleterious mutations across pan-cancer drivers. We also achieve improved patient survival estimation using a proposed set of novel cancer-involved genes. Ultimately, this pipeline enables accelerated insight-gathering of sequence-specific consequences for a class of understudied mutations and provides an efficient way of filtering through massive variant datasets - functionalities with immediate experimental and clinical applications.
Collapse
Affiliation(s)
- Nicolas Lynn
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, 69978, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, 69978, Israel.
| |
Collapse
|
3
|
Corzo G, Seeling-Branscomb CE, Seeling JM. Differential Synonymous Codon Selection in the B56 Gene Family of PP2A Regulatory Subunits. Int J Mol Sci 2023; 25:392. [PMID: 38203563 PMCID: PMC10778929 DOI: 10.3390/ijms25010392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/18/2023] [Accepted: 12/23/2023] [Indexed: 01/12/2024] Open
Abstract
Protein phosphatase 2A (PP2A) functions as a tumor suppressor and consists of a scaffolding, catalytic, and regulatory subunit. The B56 gene family of regulatory subunits impart distinct functions onto PP2A. Codon usage bias (CUB) involves the selection of synonymous codons, which can affect gene expression by modulating processes such as transcription and translation. CUB can vary along the length of a gene, and differential use of synonymous codons can be important in the divergence of gene families. The N-termini of the gene product encoded by B56α possessed high CUB, high GC content at the third codon position (GC3), and high rare codon content. In addition, differential CUB was found in the sequence encoding two B56γ N-terminal splice forms. The sequence encoding the N-termini of B56γ/γ, relative to B56δ/γ, displayed CUB, utilized more frequent codons, and had higher GC3 content. B56α mRNA had stronger than predicted secondary structure at their 5' end, and the B56δ/γ splice variants had long regions of weaker than predicted secondary structure at their 5' end. The data suggest that B56α is expressed at relatively low levels as compared to the other B56 isoforms and that the B56δ/γ splice variant is expressed more highly than B56γ/γ.
Collapse
Affiliation(s)
- Gabriel Corzo
- Department of Biology, Hofstra University, Hempstead, NY 11549, USA;
| | | | - Joni M. Seeling
- Department of Biology, Hofstra University, Hempstead, NY 11549, USA;
| |
Collapse
|
4
|
Maitre E, Macro M, Troussard X. Hairy cell leukaemia with unusual BRAF mutations. J Cell Mol Med 2023; 27:2626-2630. [PMID: 37530550 PMCID: PMC10468650 DOI: 10.1111/jcmm.17890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/06/2023] [Accepted: 07/22/2023] [Indexed: 08/03/2023] Open
Abstract
Hairy cell leukaemia (HCL) diagnosis is based on the morphologic detection of circulating abnormal hairy cells in the peripheral blood and/or bone marrow, an HCL immunological score of 3 or 4 based on the expression of the CD11c, CD25, CD103 and CD123 and also the presence of a BRAF V600E activating mutation in the B-raf proto-oncogene (BRAF gene) (7q34). When using new generation sequencing of 21 targeted genes in 124 HCL patients, we identified a cohort of 6/124 (2%) patients with unusual BRAF mutations: two patients presented non-V600 mutations (BRAF F595L, BRAF W604L respectively) and four other patients silent BRAF mutations. When using droplet digital PCR (ddPCR) three of the four patients with concomitant BRAF V600E and silent mutation were negative. The respective role of these mutations in the occurrence of HCL or its progression remains to be clarified, but BRAF sequencing is necessary in case of negative BRAF V600E by ddPCR.
Collapse
Affiliation(s)
- Elsa Maitre
- Laboratoire d'HématologieCHU Caen NormandieCaenFrance
| | - Margaret Macro
- Institut bas Normand d'HématologieCHU Caen NormandieCaenFrance
| | - Xavier Troussard
- Laboratoire d'HématologieCHU Caen NormandieCaenFrance
- Institut bas Normand d'HématologieCHU Caen NormandieCaenFrance
| |
Collapse
|
5
|
Implementing computational methods in tandem with synonymous gene recoding for therapeutic development. Trends Pharmacol Sci 2023; 44:73-84. [PMID: 36307252 DOI: 10.1016/j.tips.2022.09.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 12/24/2022]
Abstract
Synonymous gene recoding, the substitution of synonymous variants into the genetic sequence, has been used to overcome many production limitations in therapeutic development. However, the safety and efficacy of recoded therapeutics can be difficult to evaluate because synonymous codon substitutions can result in subtle, yet impactful changes in protein features and require sensitive methods for detection. Given that computational approaches have made significant leaps in recent years, we propose that machine-learning (ML) tools may be leveraged to assess gene-recoded therapeutics and foresee an opportunity to adapt codon contexts to enhance some powerful existing tools. Here, we examine how synonymous gene recoding has been used to address challenges in therapeutic development, explain the biological mechanisms underlying its effects, and explore the application of computational platforms to improve the surveillance of functional variants in therapeutic design.
Collapse
|
6
|
Synonymous mutation rs1129293 is associated with PIK3CG expression and PI3Kγ activation in patients with chronic Chagas cardiomyopathy. Immunobiology 2022; 227:152242. [PMID: 35870262 DOI: 10.1016/j.imbio.2022.152242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/23/2022] [Accepted: 07/06/2022] [Indexed: 11/20/2022]
Abstract
Single nucleotide polymorphisms (SNPs) that do not change the composition of amino acids and cause synonymous mutations (sSNPs) were previously considered to lack any functional roles. However, sSNPs have recently been shown to interfere with protein expression owing to a myriad of factors related to the regulation of transcription, mRNA stability, and protein translation processes. In patients with Chagas disease, the presence of the synonymous mutation rs1129293 in phosphatidylinositol-4,5-bisphosphate 3-kinase gamma (PIK3CG) gene contributes to the development of the chronic Chagas cardiomyopathy (CCC), instead of the digestive or asymptomatic forms. In this study, we aimed to investigate whether rs1129293 is associated with the transcription of PIK3CG mRNA and its activity by quantifying AKT phosphorylation in the heart samples of 26 chagasic patients with CCC. Our results showed an association between rs1129293 and decreased PIK3CG mRNA expression levels in the cardiac tissues of patients with CCC. The phosphorylation levels of AKT, the protein target of PI3K, were also reduced in patients with this mutation, but were not correlated with PI3KCG mRNA expression levels. Moreover, bioinformatics analysis showed that rs1129293 and other SNPs in linkage disequilibrium (LD) were associated with the transcriptional regulatory elements, post-transcriptional modifications, and cell-specific splicing expression of PIK3CG mRNA. Therefore, our data demonstrates that the synonymous SNP rs1129293 is capable of affecting the PIK3CG mRNA expression and PI3Kγ activation.
Collapse
|
7
|
Zhang S, Yang G. IL22RA1/JAK/STAT Signaling Acts As a Cancer Target Through Pan-Cancer Analysis. Front Immunol 2022; 13:915246. [PMID: 35874683 PMCID: PMC9304570 DOI: 10.3389/fimmu.2022.915246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 06/20/2022] [Indexed: 11/29/2022] Open
Abstract
Cytokines and cytokine receptors are important mediators in immunity and cancer development. Interleukin 22 (IL22) is one of the most important cytokines which has protumor effect. Given that common and specific roles of cytokines/receptors in multiple cancers, we conducted a pan-cancer study to investigate the role of IL22RA1 in cancer using The Cancer Genome Atlas (TCGA) database. Notably, we found IL22RA1 transcript was upregulated in 11 cancer types compared with their corresponding control. The mRNA expression level of IL22RA1 was highest in the pancreas among tumor tissues. The higher expression of IL22RA1 was associated with worse overall survival rate in patients. A total of 30 IL22RA1-correlated genes (e.g. IL17D, IL22RA2, IL20RB, IL10RA, IL10RB, TSLP and TYK2) are involved in the JAK/STAT pathway which promotes tumor progression. The upregulation of IL22RA1 in tumors was correlated with immune cell infiltration level. Higher expression of IL22RA2, IL20RB, IL10RA, IL10RB, TSLP, TYK2, STAT1 and STAT3 was associated with decreased overall survival rate in patients. IL22RA1 mutation was observed more in uterine cancer and melanoma compared with the other cancer types. Deactivation of IL22RA1 induced a lot of changes in gene expression. IL22RA1 mutants had upregulated DNA damage/repair genes in uterine cancer, whereas downregulated genes in the FoxO signaling pathway. In melanoma, mutation of IL22RA1 can upregulate the HIF signaling pathway but downregulate metabolic pathways. Our study suggests that IL22RA1/JAK/STAT signaling can be an important target for cancer treatment.
Collapse
Affiliation(s)
- Shuai Zhang
- Department of Pathology and Laboratory Medicine, Davis Health, University of California, Sacramento, CA, United States
- College of Veterinary Medicine, Northeast Agricultural University, Harbin, China
| | - Guiyan Yang
- Department of Pathology and Laboratory Medicine, Davis Health, University of California, Sacramento, CA, United States
- College of Veterinary Medicine, China Agricultural University, Beijing, China
- *Correspondence: Guiyan Yang,
| |
Collapse
|
8
|
Kaissarian NM, Meyer D, Kimchi-Sarfaty C. Synonymous Variants: Necessary Nuance in our Understanding of Cancer Drivers and Treatment Outcomes. J Natl Cancer Inst 2022; 114:1072-1094. [PMID: 35477782 PMCID: PMC9360466 DOI: 10.1093/jnci/djac090] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/24/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Once called "silent mutations" and assumed to have no effect on protein structure and function, synonymous variants are now recognized to be drivers for some cancers. There have been significant advances in our understanding of the numerous mechanisms by which synonymous single nucleotide variants (sSNVs) can affect protein structure and function by affecting pre-mRNA splicing, mRNA expression, stability, folding, miRNA binding, translation kinetics, and co-translational folding. This review highlights the need for considering sSNVs in cancer biology to gain a better understanding of the genetic determinants of human cancers and to improve their diagnosis and treatment. We surveyed the literature for reports of sSNVs in cancer and found numerous studies on the consequences of sSNVs on gene function with supporting in vitro evidence. We also found reports of sSNVs that have statistically significant associations with specific cancer types but for which in vitro studies are lacking to support the reported associations. Additionally, we found reports of germline and somatic sSNVs that were observed in numerous clinical studies and for which in silico analysis predicts possible effects on gene function. We provide a review of these investigations and discuss necessary future studies to elucidate the mechanisms by which sSNVs disrupt protein function and are play a role in tumorigeneses, cancer progression, and treatment efficacy. As splicing dysregulation is one of the most well recognized mechanisms by which sSNVs impact protein function, we also include our own in silico analysis for predicting which sSNVs may disrupt pre-mRNA splicing.
Collapse
Affiliation(s)
- Nayiri M Kaissarian
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|
9
|
Pecka-Kiełb E, Kowalewska-Łuczak I, Czerniawska-Piątkowska E, Króliczewska B. FASN, SCD1 and ANXA9 gene polymorphism as genetic predictors of the fatty acid profile of sheep milk. Sci Rep 2021; 11:23761. [PMID: 34887487 PMCID: PMC8660767 DOI: 10.1038/s41598-021-03186-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/23/2021] [Indexed: 01/05/2023] Open
Abstract
In this study, single nucleotide polymorphisms (SNPs) in the ANXA9 (annexin 9), FASN (fatty acid synthase) and SCD1 (stearoyl-CoA desaturase 1) genes were analyzed as factors influencing fatty acid profiles in milk from Zošľachtená valaška sheep. SNP in selected genes was identified using polymerase chain reaction (PCR) and restriction fragment length polymorphism (PCR–RFLP). The long-chain fatty acids profile in sheep milk was identified by gas chromatography. Statistical analysis of the SCD1/Cfr13I polymorphism showed that the milk of the homozygous AA animals was characterized by a lower (P < 0.05) share of C4:0, C6:0, C8:0, C10:0, C12:0, C14:0 in comparison to the homozygous CC sheep. The milk of heterozygous sheep was characterized by a higher (P < 0.05) proportion of C13:0 acid compared to the milk of sheep with the homozygous AA type. A higher (P < 0.05) level of saturated fatty acids (SFA) was found in the milk of CC genotype sheep compared to the AA genotype. Our results lead to the conclusion that the greatest changes were observed for the SCD1/Cfr13I polymorphism and the least significant ones for FASN/AciI. Moreover, it is the first evidence that milk from sheep with SCD1/Cfr13I polymorphism and the homozygous AA genotype showed the most desirable fatty acids profile.
Collapse
Affiliation(s)
- Ewa Pecka-Kiełb
- Department of Biostructure and Animal Physiology, Faculty of Veterinary Medicine, Wroclaw University of Environmental and Life Sciences, Norwida 31, 50-375, Wrocław, Poland.
| | - Inga Kowalewska-Łuczak
- Department of Genetics, Faculty of Biotechnology and Animal Husbandry, West Pomeranian University of Technology in Szczecin, Piastów Avenue 45, 79-311, Szczecin, Poland
| | - Ewa Czerniawska-Piątkowska
- Department of Ruminant Science, Faculty of Biotechnology and Animal Husbandry, West Pomeranian University of Technology in Szczecin, Klemensa Janickiego 29, 71-270, Szczecin, Poland
| | - Bożena Króliczewska
- Department of Biostructure and Animal Physiology, Faculty of Veterinary Medicine, Wroclaw University of Environmental and Life Sciences, Norwida 31, 50-375, Wrocław, Poland
| |
Collapse
|