1
|
Rahman MM, Nasir MK, Nur-A-Alam M, Khan MSI. Proposing a hybrid technique of feature fusion and convolutional neural network for melanoma skin cancer detection. J Pathol Inform 2023; 14:100341. [PMID: 38028129 PMCID: PMC10630642 DOI: 10.1016/j.jpi.2023.100341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/20/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023] Open
Abstract
Skin cancer is among the most common cancer types worldwide. Automatic identification of skin cancer is complicated because of the poor contrast and apparent resemblance between skin and lesions. The rate of human death can be significantly reduced if melanoma skin cancer could be detected quickly using dermoscopy images. This research uses an anisotropic diffusion filtering method on dermoscopy images to remove multiplicative speckle noise. To do this, the fast-bounding box (FBB) method is applied here to segment the skin cancer region. We also employ 2 feature extractors to represent images. The first one is the Hybrid Feature Extractor (HFE), and second one is the convolutional neural network VGG19-based CNN. The HFE combines 3 feature extraction approaches namely, Histogram-Oriented Gradient (HOG), Local Binary Pattern (LBP), and Speed Up Robust Feature (SURF) into a single fused feature vector. The CNN method is also used to extract additional features from test and training datasets. This 2-feature vector is then fused to design the classification model. The proposed method is then employed on 2 datasets namely, ISIC 2017 and the academic torrents dataset. Our proposed method achieves 99.85%, 91.65%, and 95.70% in terms of accuracy, sensitivity, and specificity, respectively, making it more successful than previously proposed machine learning algorithms.
Collapse
Affiliation(s)
- Md. Mahbubur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka 1216, Bangladesh
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Mostofa Kamal Nasir
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Md. Nur-A-Alam
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
- Department of CSE, Dhaka International University, Dhaka 1205, Bangladesh
| | - Md. Saikat Islam Khan
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
- Department of CSE, Dhaka International University, Dhaka 1205, Bangladesh
| |
Collapse
|
2
|
Selective Depletion of ZAP-Binding CpG Motifs in HCV Evolution. Pathogens 2022; 12:pathogens12010043. [PMID: 36678391 PMCID: PMC9866289 DOI: 10.3390/pathogens12010043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/08/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022] Open
Abstract
Hepatitis C virus (HCV) is a bloodborne pathogen that can cause chronic liver disease and hepatocellular carcinoma. The loss of CpGs from virus genomes allows escape from restriction by the host zinc-finger antiviral protein (ZAP). The evolution of HCV in the human host has not been explored in the context of CpG depletion. We analysed 2616 full-length HCV genomes from 1977 to 2021. During the four decades of evolution in humans, we found that HCV genomes have become significantly depleted in (a) CpG numbers, (b) CpG O/E ratios (i.e., relative abundance of CpGs), and (c) the number of ZAP-binding motifs. Interestingly, our data suggests that the loss of CpGs in HCV genomes over time is primarily driven by the loss of ZAP-binding motifs; thus suggesting a yet unknown role for ZAP-mediated selection pressures in HCV evolution. The HCV core gene is significantly enriched for the number of CpGs and ZAP-binding motifs. In contrast to the rest of the HCV genome, the loss of CpGs from the core gene does not appear to be driven by ZAP-mediated selection. This work highlights CpG depletion in HCV genomes during their evolution in humans and the role of ZAP-mediated selection in HCV evolution.
Collapse
|
3
|
Band SS, Ardabili S, Yarahmadi A, Pahlevanzadeh B, Kiani AK, Beheshti A, Alinejad-Rokny H, Dehzangi I, Chang A, Mosavi A, Moslehpour M. A Survey on Machine Learning and Internet of Medical Things-Based Approaches for Handling COVID-19: Meta-Analysis. Front Public Health 2022; 10:869238. [PMID: 35812486 PMCID: PMC9260273 DOI: 10.3389/fpubh.2022.869238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
Early diagnosis, prioritization, screening, clustering, and tracking of patients with COVID-19, and production of drugs and vaccines are some of the applications that have made it necessary to use a new style of technology to involve, manage, and deal with this epidemic. Strategies backed by artificial intelligence (A.I.) and the Internet of Things (IoT) have been undeniably effective to understand how the virus works and prevent it from spreading. Accordingly, the main aim of this survey is to critically review the ML, IoT, and the integration of IoT and ML-based techniques in the applications related to COVID-19, from the diagnosis of the disease to the prediction of its outbreak. According to the main findings, IoT provided a prompt and efficient approach to tracking the disease spread. On the other hand, most of the studies developed by ML-based techniques aimed at the detection and handling of challenges associated with the COVID-19 pandemic. Among different approaches, Convolutional Neural Network (CNN), Support Vector Machine, Genetic CNN, and pre-trained CNN, followed by ResNet have demonstrated the best performances compared to other methods.
Collapse
Affiliation(s)
- Shahab S. Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Sina Ardabili
- Department of Informatics, J. Selye University, Komárom, Slovakia
| | - Atefeh Yarahmadi
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Bahareh Pahlevanzadeh
- Department of Design and System Operations, Regional Information Center for Science and Technology (R.I.C.E.S.T.), Shiraz, Iran
| | - Adiqa Kausar Kiani
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, U.N.S.W. Sydney, Sydney, NSW, Australia
- U.N.S.W. Data Science Hub, The University of New South Wales (U.N.S.W. Sydney), Sydney, NSW, Australia
- Health Data Analytics Program, AI-enabled Processes (A.I.P.) Research Centre, Macquarie University, Sydney, NSW, Australia
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, United States
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, United States
| | - Arthur Chang
- Bachelor Program in Interdisciplinary Studies, National Yunlin University of Science and Technology, Douliu, Taiwan
| | - Amir Mosavi
- John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary
- Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| | - Massoud Moslehpour
- Department of Business Administration, College of Management, Asia University, Taichung, Taiwan
- Department of Management, California State University, San Bernardino, CA, United States
| |
Collapse
|
4
|
Sharifonnasabi F, Jhanjhi NZ, John J, Obeidy P, Band SS, Alinejad-Rokny H, Baz M. Hybrid HCNN-KNN Model Enhances Age Estimation Accuracy in Orthopantomography. Front Public Health 2022; 10:879418. [PMID: 35712286 PMCID: PMC9197238 DOI: 10.3389/fpubh.2022.879418] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022] Open
Abstract
Age estimation in dental radiographs Orthopantomography (OPG) is a medical imaging technique that physicians and pathologists utilize for disease identification and legal matters. For example, for estimating post-mortem interval, detecting child abuse, drug trafficking, and identifying an unknown body. Recent development in automated image processing models improved the age estimation's limited precision to an approximate range of +/- 1 year. While this estimation is often accepted as accurate measurement, age estimation should be as precise as possible in most serious matters, such as homicide. Current age estimation techniques are highly dependent on manual and time-consuming image processing. Age estimation is often a time-sensitive matter in which the image processing time is vital. Recent development in Machine learning-based data processing methods has decreased the imaging time processing; however, the accuracy of these techniques remains to be further improved. We proposed an ensemble method of image classifiers to enhance the accuracy of age estimation using OPGs from 1 year to a couple of months (1-3-6). This hybrid model is based on convolutional neural networks (CNN) and K nearest neighbors (KNN). The hybrid (HCNN-KNN) model was used to investigate 1,922 panoramic dental radiographs of patients aged 15 to 23. These OPGs were obtained from the various teaching institutes and private dental clinics in Malaysia. To minimize the chance of overfitting in our model, we used the principal component analysis (PCA) algorithm and eliminated the features with high correlation. To further enhance the performance of our hybrid model, we performed systematic image pre-processing. We applied a series of classifications to train our model. We have successfully demonstrated that combining these innovative approaches has improved the classification and segmentation and thus the age-estimation outcome of the model. Our findings suggest that our innovative model, for the first time, to the best of our knowledge, successfully estimated the age in classified studies of 1 year old, 6 months, 3 months and 1-month-old cases with accuracies of 99.98, 99.96, 99.87, and 98.78 respectively.
Collapse
Affiliation(s)
- Fatemeh Sharifonnasabi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Noor Zaman Jhanjhi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Jacob John
- Department of Restorative Dentistry, Faculty of Dentistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peyman Obeidy
- Charles Perkins Centre, Faculty of Medicine and Health, University of Sydney, Darlington, NSW, Australia
| | - Shahab S Band
- Future Technology Research Centre, College of Future, National Yunlin University of Science and Technology, Yunlin, Taiwan
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, University of New South Wales (UNSW) Sydney, Kensington, NSW, Australia.,UNSW Data Science Hub, The University of New South Wales, UNSW Sydney, Kensington, NSW, Australia.,Health Data Analytics Program, AI-enabled Processes (AIP) Research Centre, Macquarie University, Macquarie Park, NSW, Australia
| | - Mohammed Baz
- Department of Computer Engineering, College of Computer and Information Technology, Taif University, Taif, Saudi Arabia
| |
Collapse
|
5
|
Rezaie N, Bayati M, Hamidi M, Tahaei MS, Khorasani S, Lovell NH, Breen J, Rabiee HR, Alinejad-Rokny H. Somatic point mutations are enriched in non-coding RNAs with possible regulatory function in breast cancer. Commun Biol 2022; 5:556. [PMID: 35672401 PMCID: PMC9174258 DOI: 10.1038/s42003-022-03528-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 05/24/2022] [Indexed: 11/09/2022] Open
Abstract
Non-coding RNAs (ncRNAs) form a large portion of the mammalian genome. However, their biological functions are poorly characterized in cancers. In this study, using a newly developed tool, SomaGene, we analyze de novo somatic point mutations from the International Cancer Genome Consortium (ICGC) whole-genome sequencing data of 1,855 breast cancer samples. We identify 1030 candidates of ncRNAs that are significantly and explicitly mutated in breast cancer samples. By integrating data from the ENCODE regulatory features and FANTOM5 expression atlas, we show that the candidate ncRNAs significantly enrich active chromatin histone marks (1.9 times), CTCF binding sites (2.45 times), DNase accessibility (1.76 times), HMM predicted enhancers (2.26 times) and eQTL polymorphisms (1.77 times). Importantly, we show that the 1030 ncRNAs contain a much higher level (3.64 times) of breast cancer-associated genome-wide association (GWAS) single nucleotide polymorphisms (SNPs) than genome-wide expectation. Such enrichment has not been seen with GWAS SNPs from other cancers. Using breast cell line related Hi-C data, we then show that 82% of our candidate ncRNAs (1.9 times) significantly interact with the promoter of protein-coding genes, including previously known cancer-associated genes, suggesting the critical role of candidate ncRNA genes in the activation of essential regulators of development and differentiation in breast cancer. We provide an extensive web-based resource (https://www.ihealthe.unsw.edu.au/research) to communicate our results with the research community. Our list of breast cancer-specific ncRNA genes has the potential to provide a better understanding of the underlying genetic causes of breast cancer. Lastly, the tool developed in this study can be used to analyze somatic mutations in all cancers. The SomaGene tool is developed to identify non-coding RNAs (ncRNAs) mutated in breast cancer but can be used for other cancers. Candidate ncRNAs are shown to be enriched for regulatory features and to contain specific trait loci polymorphisms.
Collapse
Affiliation(s)
- Narges Rezaie
- Center for Complex Biological Systems, University of California Irvine, Irvine, CA, 92697, USA
| | - Masroor Bayati
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, 11365, Iran
| | - Mehrab Hamidi
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, 11365, Iran
| | - Maedeh Sadat Tahaei
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, 11365, Iran
| | - Sadegh Khorasani
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, 11365, Iran
| | - Nigel H Lovell
- Tyree Institute of Health Engineering and The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - James Breen
- South Australian Health & Medical Research Institute, Adelaide, SA, 5000, Australia.,Robinson Research Institute, University of Adelaide, Adelaide, SA, 5006, Australia.,Bioinformatics Hub, University of Adelaide, Adelaide, SA, 5006, Australia
| | - Hamid R Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, 11365, Iran.
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia. .,UNSW Data Science Hub, The University of New South Wales (UNSW Sydney), Sydney, NSW, 2052, Australia. .,Health Data Analytics Program, AI-enabled Processes (AIP) Research Centre, Macquarie University, Sydney, NSW, 2109, Australia.
| |
Collapse
|
6
|
Four-layer ConvNet to facial emotion recognition with minimal epochs and the significance of data diversity. Sci Rep 2022; 12:6991. [PMID: 35484318 PMCID: PMC9050748 DOI: 10.1038/s41598-022-11173-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 04/06/2022] [Indexed: 11/08/2022] Open
Abstract
Emotion recognition is defined as identifying human emotion and is directly related to different fields such as human-computer interfaces, human emotional processing, irrational analysis, medical diagnostics, data-driven animation, human-robot communication, and many more. This paper proposes a new facial emotional recognition model using a convolutional neural network. Our proposed model, "ConvNet", detects seven specific emotions from image data including anger, disgust, fear, happiness, neutrality, sadness, and surprise. The features extracted by the Local Binary Pattern (LBP), region based Oriented FAST and rotated BRIEF (ORB) and Convolutional Neural network (CNN) from facial expressions images were fused to develop the classification model through training by our proposed CNN model (ConvNet). Our method can converge quickly and achieves good performance which the authors can develop a real-time schema that can easily fit the model and sense emotions. Furthermore, this study focuses on the mental or emotional stuff of a man or woman using the behavioral aspects. To complete the training of the CNN network model, we use the FER2013 databases at first, and then apply the generalization techniques to the JAFFE and CK+ datasets respectively in the testing stage to evaluate the performance of the model. In the generalization approach on the JAFFE dataset, we get a 92.05% accuracy, while on the CK+ dataset, we acquire a 98.13% accuracy which achieve the best performance among existing methods. We also test the system's success by identifying facial expressions in real-time. ConvNet consists of four layers of convolution together with two fully connected layers. The experimental results show that the ConvNet is able to achieve 96% training accuracy which is much better than current existing models. However, when compared to other validation methods, the suggested technique was more accurate. ConvNet also achieved validation accuracy of 91.01% for the FER2013 dataset. We also made all the materials publicly accessible for the research community at: https://github.com/Tanoy004/Emotion-recognition-through-CNN .
Collapse
|
7
|
Simón D, Cristina J, Musto H. An overview of dinucleotide and codon usage in all viruses. Arch Virol 2022; 167:1443-1448. [PMID: 35467158 DOI: 10.1007/s00705-022-05454-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/05/2022] [Indexed: 11/30/2022]
Abstract
Viruses are, by far, the most abundant biological entities on earth. They are found in all known ecological niches and are the causative agents of many important diseases in plants and animals. From an evolutionary point of view, since viruses do not share any orthologous genes, there is a general consensus that they are polyphyletic; that is, they do not have a common ancestor. This means that they appeared several times during the course of evolution. For their life cycle, they are always obligate parasites of a free cellular life form, which can be bacteria, archaea, or eukaryotes. More complexity is added to these entities by the fact that their genetic material can be DNA or RNA (double- or single-stranded) or retrotranscribed. Given these features, we wondered if some general rules can be inferred when studying two basic genomic signatures-dinucleotides and codon usage-analyzing all available complete and non-redundant viral sequences. In spite of the obviously biased sample of sequences available, some general features appear to emerge.
Collapse
Affiliation(s)
- Diego Simón
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Juan Cristina
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
8
|
Dashti H, Dehzangi I, Bayati M, Breen J, Beheshti A, Lovell N, Rabiee HR, Alinejad-Rokny H. Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer. BMC Bioinformatics 2022; 23:138. [PMID: 35439935 PMCID: PMC9017053 DOI: 10.1186/s12859-022-04652-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/24/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. RESULTS In this study, we develop a new pipeline based on a novel concept called 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. CONCLUSION Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Collapse
Affiliation(s)
- Hamed Dashti
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - Iman Dehzangi
- Center for Computational and Integrative Biology (CCIB), Rutgers University, Camden, NJ, 08102, USA
| | - Masroor Bayati
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - James Breen
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.,Robinson Research Institute, University of Adelaide, Adelaide, SA, 5006, Australia.,Bioinformatics Hub, University of Adelaide, Adelaide, SA, 5006, Australia
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, 2109, Australia
| | - Nigel Lovell
- Tyree Institute of Health Engineering and The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Hamid R Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran.
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia. .,UNSW Data Science Hub, The University of New South Wales, Sydney, NSW, 2052, Australia. .,Health Data Analytics Program, AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia.
| |
Collapse
|
9
|
The low abundance of CpG in the SARS-CoV-2 genome is not an evolutionarily signature of ZAP. Sci Rep 2022; 12:2420. [PMID: 35165300 PMCID: PMC8844275 DOI: 10.1038/s41598-022-06046-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 12/28/2021] [Indexed: 12/23/2022] Open
Abstract
The zinc finger antiviral protein (ZAP) is known to restrict viral replication by binding to the CpG rich regions of viral RNA, and subsequently inducing viral RNA degradation. This enzyme has recently been shown to be capable of restricting SARS-CoV-2. These data have led to the hypothesis that the low abundance of CpG in the SARS-CoV-2 genome is due to an evolutionary pressure exerted by the host ZAP. To investigate this hypothesis, we performed a detailed analysis of many coronavirus sequences and ZAP RNA binding preference data. Our analyses showed neither evidence for an evolutionary pressure acting specifically on CpG dinucleotides, nor a link between the activity of ZAP and the low CpG abundance of the SARS-CoV-2 genome.
Collapse
|
10
|
Sertkaya H, Hidalgo L, Ficarelli M, Kmiec D, Signell AW, Ali S, Parker H, Wilson H, Neil SJ, Malim MH, Vink CA, Swanson CM. Minimal impact of ZAP on lentiviral vector production and transduction efficiency. Mol Ther Methods Clin Dev 2021; 23:147-157. [PMID: 34703838 PMCID: PMC8517000 DOI: 10.1016/j.omtm.2021.08.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 08/24/2021] [Indexed: 11/29/2022]
Abstract
The antiviral protein ZAP binds CpG dinucleotides in viral RNA to inhibit replication. This has likely led to the CpG suppression observed in many RNA viruses, including retroviruses. Sequences added to retroviral vector genomes, such as internal promoters, transgenes, or regulatory elements, substantially increase CpG abundance. Because these CpGs could allow retroviral vector RNA to be targeted by ZAP, we analyzed whether it restricts vector production, transduction efficiency, and transgene expression. Surprisingly, even though CpG-high HIV-1 was efficiently inhibited by ZAP in HEK293T cells, depleting ZAP did not substantially increase lentiviral vector titer using several packaging and genome plasmids. ZAP overexpression also did not inhibit lentiviral vector titer. In addition, decreasing CpG abundance in a lentiviral vector genome did not increase its titer, and a gammaretroviral vector derived from murine leukemia virus was not substantially restricted by ZAP. Overall, we show that the increased CpG abundance in retroviral vectors relative to the wild-type retroviruses they are derived from does not intrinsically sensitize them to ZAP. Further understanding of how ZAP specifically targets transcripts to inhibit their expression may allow the development of CpG sequence contexts that efficiently recruit or evade this antiviral system.
Collapse
Affiliation(s)
- Helin Sertkaya
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Laura Hidalgo
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Mattia Ficarelli
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Dorota Kmiec
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Adrian W. Signell
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Sadfer Ali
- Cell & Gene Therapy Platform, Medicinal Science and Technology, GSK, Stevenage SG1 2NY, UK
| | - Hannah Parker
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Harry Wilson
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Stuart J.D. Neil
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Michael H. Malim
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| | - Conrad A. Vink
- Cell & Gene Therapy Platform, Medicinal Science and Technology, GSK, Stevenage SG1 2NY, UK
| | - Chad M. Swanson
- Department of Infectious Diseases, King’s College London, London SE1 9RT, UK
| |
Collapse
|
11
|
Arumugam T, Ramphal U, Adimulam T, Chinniah R, Ramsuran V. Deciphering DNA Methylation in HIV Infection. Front Immunol 2021; 12:795121. [PMID: 34925380 PMCID: PMC8674454 DOI: 10.3389/fimmu.2021.795121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 11/17/2021] [Indexed: 12/13/2022] Open
Abstract
With approximately 38 million people living with HIV/AIDS globally, and a further 1.5 million new global infections per year, it is imperative that we advance our understanding of all factors contributing to HIV infection. While most studies have focused on the influence of host genetic factors on HIV pathogenesis, epigenetic factors are gaining attention. Epigenetics involves alterations in gene expression without altering the DNA sequence. DNA methylation is a critical epigenetic mechanism that influences both viral and host factors. This review has five focal points, which examines (i) fluctuations in the expression of methylation modifying factors upon HIV infection (ii) the effect of DNA methylation on HIV viral genes and (iii) host genome (iv) inferences from other infectious and non-communicable diseases, we provide a list of HIV-associated host genes that are regulated by methylation in other disease models (v) the potential of DNA methylation as an epi-therapeutic strategy and biomarker. DNA methylation has also been shown to serve as a robust therapeutic strategy and precision medicine biomarker against diseases such as cancer and autoimmune conditions. Despite new drugs being discovered for HIV, drug resistance is a problem in high disease burden settings such as Sub-Saharan Africa. Furthermore, genetic therapies that are under investigation are irreversible and may have off target effects. Alternative therapies that are nongenetic are essential. In this review, we discuss the potential role of DNA methylation as a novel therapeutic intervention against HIV.
Collapse
Affiliation(s)
- Thilona Arumugam
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Upasana Ramphal
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Theolan Adimulam
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Romona Chinniah
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Veron Ramsuran
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
12
|
Ghareyazi A, Mohseni A, Dashti H, Beheshti A, Dehzangi A, Rabiee HR, Alinejad-Rokny H. Whole-Genome Analysis of De Novo Somatic Point Mutations Reveals Novel Mutational Biomarkers in Pancreatic Cancer. Cancers (Basel) 2021; 13:4376. [PMID: 34503185 PMCID: PMC8431675 DOI: 10.3390/cancers13174376] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 12/15/2022] Open
Abstract
It is now known that at least 10% of samples with pancreatic cancers (PC) contain a causative mutation in the known susceptibility genes, suggesting the importance of identifying cancer-associated genes that carry the causative mutations in high-risk individuals for early detection of PC. In this study, we develop a statistical pipeline using a new concept, called gene-motif, that utilizes both mutated genes and mutational processes to identify 4211 3-nucleotide PC-associated gene-motifs within 203 significantly mutated genes in PC. Using these gene-motifs as distinguishable features for pancreatic cancer subtyping results in identifying five PC subtypes with distinguishable phenotypes and genotypes. Our comprehensive biological characterization reveals that these PC subtypes are associated with different molecular mechanisms including unique cancer related signaling pathways, in which for most of the subtypes targeted treatment options are currently available. Some of the pathways we identified in all five PC subtypes, including cell cycle and the Axon guidance pathway are frequently seen and mutated in cancer. We also identified Protein kinase C, EGFR (epidermal growth factor receptor) signaling pathway and P53 signaling pathways as potential targets for treatment of the PC subtypes. Altogether, our results uncover the importance of considering both the mutation type and mutated genes in the identification of cancer subtypes and biomarkers.
Collapse
Affiliation(s)
- Amin Ghareyazi
- Bioinformatics and Computational Biology Laboratory, Sharif University of Technology, Tehran 11365, Iran; (A.G.); (A.M.); (H.D.)
| | - Amir Mohseni
- Bioinformatics and Computational Biology Laboratory, Sharif University of Technology, Tehran 11365, Iran; (A.G.); (A.M.); (H.D.)
| | - Hamed Dashti
- Bioinformatics and Computational Biology Laboratory, Sharif University of Technology, Tehran 11365, Iran; (A.G.); (A.M.); (H.D.)
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW 2109, Australia;
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA;
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Hamid R. Rabiee
- Bioinformatics and Computational Biology Laboratory, Sharif University of Technology, Tehran 11365, Iran; (A.G.); (A.M.); (H.D.)
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, The University of New South Wales, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, The University of New South Wales, Sydney, NSW 2052, Australia
- Health Data Analytics Program, AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
13
|
Goswami P, Bartas M, Lexa M, Bohálová N, Volná A, Červeň J, Červeňová V, Pečinka P, Špunda V, Fojta M, Brázda V. SARS-CoV-2 hot-spot mutations are significantly enriched within inverted repeats and CpG island loci. Brief Bioinform 2021; 22:1338-1345. [PMID: 33341900 PMCID: PMC7799342 DOI: 10.1093/bib/bbaa385] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 12/18/2022] Open
Abstract
SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 genome. In this study, we inspected high-frequency mutations of SARS-CoV-2 and carried out systematic analyses of their overlay with inverted repeat (IR) loci and CpG islands. The main conclusion of our study is that SARS-CoV-2 hot-spot mutations are significantly enriched within both IRs and CpG island loci. This points to their role in genomic instability and may predict further mutational drive of the SARS-CoV-2 genome. Moreover, CpG islands are strongly enriched upstream from viral ORFs and thus could play important roles in transcription and the viral life cycle. We hypothesize that hypermethylation of these loci will decrease the transcription of viral ORFs and could therefore limit the progression of the disease.
Collapse
Affiliation(s)
- Pratik Goswami
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Martin Bartas
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Matej Lexa
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Natália Bohálová
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic.,Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Adriana Volná
- Department of Physics, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Jiří Červeň
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Veronika Červeňová
- Department of Mathematics, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Petr Pečinka
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Vladimír Špunda
- Department of Physics, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.,Global Change Research Institute of the Czech Academy of Sciences, Brno, Czech Republic
| | - Miroslav Fojta
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
| | - Václav Brázda
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
| |
Collapse
|
14
|
VIRMOTIF: A User-Friendly Tool for Viral Sequence Analysis. Genes (Basel) 2021; 12:genes12020186. [PMID: 33514039 PMCID: PMC7911170 DOI: 10.3390/genes12020186] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 01/10/2021] [Accepted: 01/19/2021] [Indexed: 12/16/2022] Open
Abstract
Bioinformatics and computational biology have significantly contributed to the generation of vast and important knowledge that can lead to great improvements and advancements in biology and its related fields. Over the past three decades, a wide range of tools and methods have been developed and proposed to enhance performance, diagnosis, and throughput while maintaining feasibility and convenience for users. Here, we propose a new user-friendly comprehensive tool called VIRMOTIF to analyze DNA sequences. VIRMOTIF brings different tools together as one package so that users can perform their analysis as a whole and in one place. VIRMOTIF is able to complete different tasks, including computing the number or probability of motifs appearing in DNA sequences, visualizing data using the matplotlib and heatmap libraries, and clustering data using four different methods, namely K-means, PCA, Mean Shift, and ClusterMap. VIRMOTIF is the only tool with the ability to analyze genomic motifs based on their frequency and representation (D-ratio) in a virus genome.
Collapse
|
15
|
Pollock DD, Castoe TA, Perry BW, Lytras S, Wade KJ, Robertson DL, Holmes EC, Boni MF, Kosakovsky Pond SL, Parry R, Carlton EJ, Wood JLN, Pennings PS, Goldstein RA. Viral CpG Deficiency Provides No Evidence That Dogs Were Intermediate Hosts for SARS-CoV-2. Mol Biol Evol 2020; 37:2706-2710. [PMID: 32658964 PMCID: PMC7454803 DOI: 10.1093/molbev/msaa178] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Due to the scope and impact of the COVID-19 pandemic there exists a strong desire to understand where the SARS-CoV-2 virus came from and how it jumped species boundaries to humans. Molecular evolutionary analyses can trace viral origins by establishing relatedness and divergence times of viruses and identifying past selective pressures. However, we must uphold rigorous standards of inference and interpretation on this topic because of the ramifications of being wrong. Here, we dispute the conclusions of Xia (2020. Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense. Mol Biol Evol. doi:10.1093/molbev/masa095) that dogs are a likely intermediate host of a SARS-CoV-2 ancestor. We highlight major flaws in Xia's inference process and his analysis of CpG deficiencies, and conclude that there is no direct evidence for the role of dogs as intermediate hosts. Bats and pangolins currently have the greatest support as ancestral hosts of SARS-CoV-2, with the strong caveat that sampling of wildlife species for coronaviruses has been limited.
Collapse
Affiliation(s)
- David D Pollock
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO
| | - Todd A Castoe
- Department of Biology, University of Texas Arlington, Arlington, TX
| | - Blair W Perry
- Department of Biology, University of Texas Arlington, Arlington, TX
| | - Spyros Lytras
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow, United Kingdom
| | - Kristen J Wade
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow, United Kingdom
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases & Biosecurity, School of Life & Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Maciej F Boni
- 5Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA
| | | | - Rhys Parry
- Australian Infectious Disease Research Centre, School of Biological Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Elizabeth J Carlton
- Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado, Anschutz, Aurora, CO
| | - James L N Wood
- Disease Dynamics Unit, Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Pleuni S Pennings
- Department of Biology, San Francisco State University, San Francisco, CA
| | - Richard A Goldstein
- Division of Infection & Immunity, University College London, London, United Kingdom
| |
Collapse
|
16
|
Matyášek R, Kovařík A. Mutation Patterns of Human SARS-CoV-2 and Bat RaTG13 Coronavirus Genomes Are Strongly Biased Towards C>U Transitions, Indicating Rapid Evolution in Their Hosts. Genes (Basel) 2020; 11:E761. [PMID: 32646049 PMCID: PMC7397057 DOI: 10.3390/genes11070761] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 06/22/2020] [Accepted: 06/29/2020] [Indexed: 12/17/2022] Open
Abstract
The pandemic caused by the spread of SARS-CoV-2 has led to considerable interest in its evolutionary origin and genome structure. Here, we analyzed mutation patterns in 34 human SARS-CoV-2 isolates and a closely related RaTG13 isolated from Rhinolophus affinis (a horseshoe bat). We also evaluated the CpG dinucleotide contents in SARS-CoV-2 and other human and animal coronavirus genomes. Out of 1136 single nucleotide variations (~4% divergence) between human SARS-CoV-2 and bat RaTG13, 682 (60%) can be attributed to C>U and U>C substitutions, far exceeding other types of substitutions. An accumulation of C>U mutations was also observed in SARS-CoV2 variants that arose within the human population. Globally, the C>U substitutions increased the frequency of codons for hydrophobic amino acids in SARS-CoV-2 peptides, while U>C substitutions decreased it. In contrast to most other coronaviruses, both SARS-CoV-2 and RaTG13 exhibited CpG depletion in their genomes. The data suggest that C-to-U conversion mediated by C deamination played a significant role in the evolution of the SARS-CoV-2 coronavirus. We hypothesize that the high frequency C>U transitions reflect virus adaptation processes in their hosts, and that SARS-CoV-2 could have been evolving for a relatively long period in humans following the transfer from animals before spreading worldwide.
Collapse
Affiliation(s)
| | - Aleš Kovařík
- Laboratory of Molecular Epigenetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, 61265 Brno, Czech Republic;
| |
Collapse
|
17
|
Ficarelli M, Antzin-Anduetza I, Hugh-White R, Firth AE, Sertkaya H, Wilson H, Neil SJD, Schulz R, Swanson CM. CpG Dinucleotides Inhibit HIV-1 Replication through Zinc Finger Antiviral Protein (ZAP)-Dependent and -Independent Mechanisms. J Virol 2020; 94:e01337-19. [PMID: 31748389 PMCID: PMC7158733 DOI: 10.1128/jvi.01337-19] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 11/06/2019] [Indexed: 02/07/2023] Open
Abstract
CpG dinucleotides are suppressed in the genomes of many vertebrate RNA viruses, including HIV-1. The cellular antiviral protein ZAP (zinc finger antiviral protein) binds CpGs and inhibits HIV-1 replication when CpGs are introduced into the viral genome. However, it is not known if ZAP-mediated restriction is the only mechanism driving CpG suppression. To determine how CpG dinucleotides affect HIV-1 replication, we increased their abundance in multiple regions of the viral genome and analyzed the effect on RNA expression, protein abundance, and infectious-virus production. We found that the antiviral effect of CpGs was not correlated with their abundance. Interestingly, CpGs inserted into some regions of the genome sensitize the virus to ZAP antiviral activity more efficiently than insertions into other regions, and this sensitivity can be modulated by interferon treatment or ZAP overexpression. Furthermore, the sensitivity of the virus to endogenous ZAP was correlated with its sensitivity to the ZAP cofactor KHNYN. Finally, we show that CpGs in some contexts can also inhibit HIV-1 replication by ZAP-independent mechanisms, and one of these is the activation of a cryptic splice site at the expense of a canonical splice site. Overall, we show that the location and sequence context of the CpG in the viral genome determines its antiviral activity.IMPORTANCE Some RNA virus genomes are suppressed in the nucleotide combination of a cytosine followed by a guanosine (CpG), indicating that they are detrimental to the virus. The antiviral protein ZAP binds viral RNA containing CpGs and prevents the virus from multiplying. However, it remains unknown how the number and position of CpGs in viral genomes affect restriction by ZAP and whether CpGs have other antiviral mechanisms. Importantly, manipulating the CpG content in viral genomes could help create new vaccines. HIV-1 shows marked CpG suppression, and by introducing CpGs into its genome, we show that ZAP efficiently targets a specific region of the viral genome, that the number of CpGs does not predict the magnitude of antiviral activity, and that CpGs can inhibit HIV-1 gene expression through a ZAP-independent mechanism. Overall, the position of CpGs in the HIV-1 genome determines the magnitude and mechanism through which they inhibit the virus.
Collapse
Affiliation(s)
- Mattia Ficarelli
- Department of Infectious Diseases, King's College London, London, United Kingdom
| | | | - Rupert Hugh-White
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Andrew E Firth
- Division of Virology, University of Cambridge, Cambridge, United Kingdom
| | - Helin Sertkaya
- Department of Infectious Diseases, King's College London, London, United Kingdom
| | - Harry Wilson
- Department of Infectious Diseases, King's College London, London, United Kingdom
| | - Stuart J D Neil
- Department of Infectious Diseases, King's College London, London, United Kingdom
| | - Reiner Schulz
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Chad M Swanson
- Department of Infectious Diseases, King's College London, London, United Kingdom
| |
Collapse
|
18
|
CANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes. Sci Rep 2020; 10:1286. [PMID: 31992766 PMCID: PMC6987109 DOI: 10.1038/s41598-020-58107-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 01/06/2020] [Indexed: 11/20/2022] Open
Abstract
Analysis of cancer mutational signatures have been instrumental in identification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly interface for analysis of cancer mutational signatures is necessary. In this manuscript we introduce CANCERSIGN, which enables users to identify 3-mer and 5-mer mutational signatures within whole genome, whole exome or pooled samples. Additionally, this tool enables users to perform clustering on tumor samples based on the proportion of mutational signatures in each sample. Using CANCERSIGN, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data.
Collapse
|
19
|
Zu W, Zhang H, Lan X, Tan X. Genome-wide evolution analysis reveals low CpG contents of fast-evolving genes and identifies antiviral microRNAs. J Genet Genomics 2020; 47:49-60. [DOI: 10.1016/j.jgg.2019.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/11/2019] [Accepted: 12/03/2019] [Indexed: 01/28/2023]
|
20
|
Cortés-Rubio CN, Salgado-Montes de Oca G, Prado-Galbarro FJ, Matías-Florentino M, Murakami-Ogasawara A, Kuri-Cervantes L, Carranco-Arenas AP, Ormsby CE, Cortés-Rubio IK, Reyes-Terán G, Ávila-Ríos S. Longitudinal variation in human immunodeficiency virus long terminal repeat methylation in individuals on suppressive antiretroviral therapy. Clin Epigenetics 2019; 11:134. [PMID: 31519219 PMCID: PMC6743183 DOI: 10.1186/s13148-019-0735-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/30/2019] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Persistence of latent, replication-competent provirus in CD4+ T cells of human immunodeficiency virus (HIV)-infected individuals on antiretroviral treatment (ART) is the main obstacle for virus eradication. Methylation of the proviral 5' long terminal repeat (LTR) promoter region has been proposed as a possible mechanism contributing to HIV latency; however, conflicting observations exist regarding its relevance. We assessed 5'-LTR methylation profiles in total CD4+ T cells from blood of 12 participants on short-term ART (30 months) followed up for 2 years, and a cross-sectional group of participants with long-term ART (6-15 years), using next generation sequencing. We then looked for associations between specific 5'-LTR methylation patterns and baseline and follow-up clinical characteristics. RESULTS 5'-LTR methylation was observed in all participants and behaved dynamically. The number of 5'-LTR variants found per sample ranged from 1 to 13, with median sequencing depth of 16270× (IQR 4107×-46760×). An overall significant 5'-LTR methylation increase was observed at month 42 compared to month 30 (median CpG Methylation Index: 74.7% vs. 0%, p = 0.025). This methylation increase was evident in a subset of participants (methylation increase group), while the rest maintained fairly high and constant methylation (constant methylation group). Persons in the methylation increase group were younger, had higher CD4+ T cell gain, larger CD8% decrease, and larger CD4/CD8 ratio change after 48 months on ART (all p < 0.001). Using principal component analysis, the constant methylation and methylation increase groups showed low evidence of separation along time (factor 2: p = 0.04). Variance was largely explained (21%) by age, CD4+/CD8+ T cell change, and CD4+ T cell subpopulation proportions. Persons with long-term ART showed overall high methylation (median CpG Methylation Index: 78%; IQR 71-87%). No differences were observed in residual plasma viral load or proviral load comparing individuals on short-term (both at 30 or 42 months) and long-term ART. CONCLUSIONS Our study shows evidence that HIV 5'-LTR methylation in total CD4+ T cells is dynamic along time and that it can follow different temporal patterns that are associated with a combination of baseline and follow-up clinical characteristics. These observations may account for differences observed between previous contrasting studies.
Collapse
Affiliation(s)
- César N. Cortés-Rubio
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | - Gonzalo Salgado-Montes de Oca
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | | | - Margarita Matías-Florentino
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | - Akio Murakami-Ogasawara
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | - Leticia Kuri-Cervantes
- Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Ana P. Carranco-Arenas
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | - Christopher E. Ormsby
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | | | - Gustavo Reyes-Terán
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| | - Santiago Ávila-Ríos
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Tlalpan 4502, 14080 Mexico City, Mexico
| |
Collapse
|
21
|
Abstract
DNA methylation is an epigenetic mechanism most commonly associated with transcriptional repression. While it is clear that DNA methylation can silence HIV proviral expression in in vitro latency models, its correlation with HIV persistence and expression in vivo is ambiguous, particularly in persons living with HIV (PLWH) receiving antiretroviral therapy (ART). DNA methylation is an epigenetic mechanism most commonly associated with transcriptional repression. While it is clear that DNA methylation can silence HIV proviral expression in in vitro latency models, its correlation with HIV persistence and expression in vivo is ambiguous, particularly in persons living with HIV (PLWH) receiving antiretroviral therapy (ART). Several factors potentially contribute to discrepancies between results in the literature, including differences in integration sites, functional proviral load, sampling bias, and stochastic PCR amplification. Recent studies into genomic features of cytosine methylation sites in mammalian genes offer potentially significant insights into this mechanism. Here, we discuss the importance of these factors in the context of the HIV.
Collapse
|
22
|
Antzin-Anduetza I, Mahiet C, Granger LA, Odendall C, Swanson CM. Increasing the CpG dinucleotide abundance in the HIV-1 genomic RNA inhibits viral replication. Retrovirology 2017; 14:49. [PMID: 29121951 PMCID: PMC5679385 DOI: 10.1186/s12977-017-0374-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 11/01/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The human immunodeficiency virus type 1 (HIV-1) structural protein Gag is necessary and sufficient to form viral particles. In addition to encoding the amino acid sequence for Gag, the underlying RNA sequence could encode cis-acting elements or nucleotide biases that are necessary for viral replication. Furthermore, RNA sequences that inhibit viral replication could be suppressed in gag. However, the functional relevance of RNA elements and nucleotide biases that promote or repress HIV-1 replication remain poorly understood. RESULTS To characterize if the RNA sequence in gag controls HIV-1 replication, the matrix (MA) region was codon modified, allowing the RNA sequence to be altered without affecting the protein sequence. Codon modification of nucleotides (nt) 22-261 or 22-378 in gag inhibited viral replication by decreasing genomic RNA (gRNA) abundance, gRNA stability, Gag expression, virion production and infectivity. Comparing the effect of these point mutations to deletions of the same region revealed that the mutations inhibited infectious virus production while the deletions did not. This demonstrated that codon modification introduced inhibitory sequences. There is a much lower than expected frequency of CpG dinucleotides in HIV-1 and codon modification introduced a substantial increase in CpG abundance. To determine if they are necessary for inhibition of HIV-1 replication, codons introducing CpG dinucleotides were mutated back to the wild type codon, which restored efficient Gag expression and infectious virion production. To determine if they are sufficient to inhibit viral replication, CpG dinucleotides were inserted into gag in the absence of other changes. The increased CpG dinucleotide content decreased HIV-1 infectivity and viral replication. CONCLUSIONS The HIV-1 RNA sequence contains low abundance of CpG dinucleotides. Increasing the abundance of CpG dinucleotides inhibits multiple steps of the viral life cycle, providing a functional explanation for why CpG dinucleotides are suppressed in HIV-1.
Collapse
Affiliation(s)
- Irati Antzin-Anduetza
- Department of Infectious Diseases, King's College London, 3rd Floor Borough Wing, Guy's Hospital, London, SE1 9RT, UK
| | - Charlotte Mahiet
- Department of Infectious Diseases, King's College London, 3rd Floor Borough Wing, Guy's Hospital, London, SE1 9RT, UK
| | - Luke A Granger
- Department of Infectious Diseases, King's College London, 3rd Floor Borough Wing, Guy's Hospital, London, SE1 9RT, UK
| | - Charlotte Odendall
- Department of Infectious Diseases, King's College London, 3rd Floor Borough Wing, Guy's Hospital, London, SE1 9RT, UK
| | - Chad M Swanson
- Department of Infectious Diseases, King's College London, 3rd Floor Borough Wing, Guy's Hospital, London, SE1 9RT, UK.
| |
Collapse
|
23
|
The CpG dinucleotide content of the HIV-1 envelope gene may predict disease progression. Sci Rep 2017; 7:8162. [PMID: 28811638 PMCID: PMC5557942 DOI: 10.1038/s41598-017-08716-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 07/12/2017] [Indexed: 11/28/2022] Open
Abstract
The clinical course of HIV-1 varies greatly among infected individuals. Despite extensive research, virus factors associated with slow-progression remain poorly understood. Identification of unique HIV-1 genomic signatures linked to slow-progression remains elusive. We investigated CpG dinucleotide content in HIV-1 envelope gene as a potential virus factor in disease progression. We analysed 1808 HIV-1 envelope gene sequences from three independent longitudinal studies; this included 1280 sequences from twelve typical-progressors and 528 sequences from six slow-progressors. Relative abundance of CpG dinucleotides and relative synonymous codon usage (RSCU) for CpG-containing codons among HIV-1 envelope gene sequences from typical-progressors and slow-progressors were analysed. HIV-1 envelope gene sequences from slow-progressors have high-CpG dinucleotide content and increased number of CpG-containing codons as compared to typical-progressors. Our findings suggest that observed differences in CpG-content between typical-progressors and slow-progressors is not explained by differences in the mononucleotide content. Our results also highlight that the high-CpG content in HIV-1 envelope gene from slow-progressors is observed immediately after seroconversion. Thus CpG dinucleotide content of HIV-1 envelope gene is a potential virus-related factor that is linked to disease progression. The CpG dinucleotide content of HIV-1 envelope gene may help predict HIV-1 disease progression at early stages after seroconversion.
Collapse
|
24
|
Dinucleotide Composition in Animal RNA Viruses Is Shaped More by Virus Family than by Host Species. J Virol 2017; 91:JVI.02381-16. [PMID: 28148785 DOI: 10.1128/jvi.02381-16] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 01/23/2017] [Indexed: 12/12/2022] Open
Abstract
Viruses use the cellular machinery of their hosts for replication. It has therefore been proposed that the nucleotide and dinucleotide compositions of viruses should match those of their host species. If this is upheld, it may then be possible to use dinucleotide composition to predict the true host species of viruses sampled in metagenomic surveys. However, it is also clear that different taxonomic groups of viruses tend to have distinctive patterns of dinucleotide composition that may be independent of host species. To determine the relative strength of the effect of host versus virus family in shaping dinucleotide composition, we performed a comparative analysis of 20 RNA virus families from 15 host groupings, spanning two animal phyla and more than 900 virus species. In particular, we determined the odds ratios for the 16 possible dinucleotides and performed a discriminant analysis to evaluate the capability of virus dinucleotide composition to predict the correct virus family or host taxon from which it was isolated. Notably, while 81% of the data analyzed here were predicted to the correct virus family, only 62% of these data were predicted to their correct subphylum/class host and a mere 32% to their correct mammalian order. Similarly, dinucleotide composition has a weak predictive power for different hosts within individual virus families. We therefore conclude that dinucleotide composition is generally uniform within a virus family but less well reflects that of its host species. This has obvious implications for attempts to accurately predict host species from virus genome sequences alone.IMPORTANCE Determining the processes that shape virus genomes is central to understanding virus evolution and emergence. One question of particular importance is why nucleotide and dinucleotide frequencies differ so markedly between viruses. In particular, it is currently unclear whether host species or virus family has the biggest impact on dinucleotide frequencies and whether dinucleotide composition can be used to accurately predict host species. Using a comparative analysis, we show that dinucleotide composition has a strong phylogenetic association across different RNA virus families, such that dinucleotide composition can predict the family from which a virus sequence has been isolated. Conversely, dinucleotide composition has a poorer predictive power for the different host species within a virus family and across different virus families, indicating that the host has a relatively small impact on the dinucleotide composition of a virus genome.
Collapse
|