1
|
Podvalnyi A, Kopernik A, Sayganova M, Woroncow M, Zobkova G, Smirnova A, Esibov A, Deviatkin A, Volchkov P, Albert E. Quantitative Analysis of Pseudogene-Associated Errors During Germline Variant Calling. Int J Mol Sci 2025; 26:363. [PMID: 39796219 PMCID: PMC11719938 DOI: 10.3390/ijms26010363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 01/13/2025] Open
Abstract
A pseudogene is a non-functional copy of a protein-coding gene. Processed pseudogenes, which are created by the reverse transcription of mRNA and subsequent integration of the resulting cDNA into the genome, being a major pseudogene class, represent a significant challenge in genome analysis due to their high sequence similarity to the parent genes and their frequent absence in the reference genome. This homology can lead to errors in variant identification, as sequences derived from processed pseudogenes can be incorrectly assigned to parental genes, complicating correct variant calling. In this study, we quantified the occurrence of variant calling errors associated with pseudogenes, generated by the most popular germline variant callers, namely GATK-HC, DRAGEN, and DeepVariant, when analysing 30x human whole-genome sequencing data (n = 13,307). The results show that the presence of pseudogenes can interfere with variant calling, leading to false positive identifications of potentially clinically relevant variants. Compared to other approaches, DeepVariant was the most effective in correcting these errors.
Collapse
Affiliation(s)
- Artem Podvalnyi
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Computer Science, HSE University, 101000 Moscow, Russia
| | - Arina Kopernik
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Mariia Sayganova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Mary Woroncow
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| | | | | | - Anton Esibov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Andrey Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Pavel Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Eugene Albert
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| |
Collapse
|
2
|
Wei Z, Sun J, Li Q, Yao T, Zeng H, Wang Y. RetroScan: An Easy-to-Use Pipeline for Retrocopy Annotation and Visualization. Front Genet 2021; 12:719204. [PMID: 34484306 PMCID: PMC8415311 DOI: 10.3389/fgene.2021.719204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
Retrocopies, which are considered “junk genes,” are occasionally formed via the insertion of reverse-transcribed mRNAs at new positions in the genome. However, an increasing number of recent studies have shown that some retrocopies exhibit new biological functions and may contribute to genome evolution. Hence, the identification of retrocopies has become very meaningful for studying gene duplication and new gene generation. Current pipelines identify retrocopies through complex operations using alignment programs and filter scripts in a step-by-step manner. Therefore, there is an urgent need for a simple and convenient retrocopy annotation tool. Here, we report the development of RetroScan, a publicly available and easy-to-use tool for scanning, annotating and displaying retrocopies, consisting of two components: an analysis pipeline and a visual interface. The pipeline integrates a series of bioinformatics software programs and scripts for identifying retrocopies in just one line of command. Compared with previous methods, RetroScan increases accuracy and reduces false-positive results. We also provide a Shiny app for visualization. It displays information on retrocopies and their parental genes that can be used for the study of retrocopy structure and evolution. RetroScan is available at https://github.com/Vicky123wzy/RetroScan.
Collapse
Affiliation(s)
- Zhaoyuan Wei
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Chongqing, China.,Biological Science Research Center, Southwest University, Chongqing, China
| | - Jiahe Sun
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Qinhui Li
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Chongqing, China
| | - Ting Yao
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Chongqing, China
| | - Haiyue Zeng
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Yi Wang
- State Key Laboratory of Silkworm Genome Biology, Biological Science Research Center, Southwest University, Chongqing, China.,Biological Science Research Center, Southwest University, Chongqing, China
| |
Collapse
|
3
|
The pseudogene problem and RT-qPCR data normalization; SYMPK: a suitable reference gene for papillary thyroid carcinoma. Sci Rep 2020; 10:18408. [PMID: 33110161 PMCID: PMC7592052 DOI: 10.1038/s41598-020-75495-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 10/14/2020] [Indexed: 01/23/2023] Open
Abstract
In RT-qPCR, accuracy requires multiple levels of standardization, but results could be obfuscated by human errors and technical limitations. Data normalization against suitable reference genes is critical, yet their observed expression can be confounded by pseudogenes. Eight reference genes were selected based on literature review and analysis of papillary thyroid carcinoma (PTC) microarray data. RNA extraction and cDNA synthesis were followed by RT-qPCR amplification in triplicate with exon-junction or intron-spanning primers. Several statistical analyses were applied using Microsoft Excel, NormFinder, and BestKeeper. In normal tissues, the least correlation of variation (CqCV%) and the lowest maximum fold change (MFC) were respectively recorded for PYCR1 and SYMPK. In PTC tissues, SYMPK had the lowest CqCV% (5.16%) and MFC (1.17). According to NormFinder, the best reference combination was SYMPK and ACTB (stability value = 0.209). BestKeeper suggested SYMPK as the best reference in both normal (r = 0.969) and PTC tissues (r = 0.958). SYMPK is suggested as the best reference gene for overcoming the pseudogene problem in RT-qPCR data normalization, with a stability value of 0.319.
Collapse
|
4
|
Fürst D, Tsamadou C, Neuchel C, Schrezenmeier H, Mytilineos J, Weinstock C. Next-Generation Sequencing Technologies in Blood Group Typing. Transfus Med Hemother 2019; 47:4-13. [PMID: 32110189 DOI: 10.1159/000504765] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 11/07/2019] [Indexed: 12/14/2022] Open
Abstract
Sequencing of the human genome has led to the definition of the genes for most of the relevant blood group systems, and the polymorphisms responsible for most of the clinically relevant blood group antigens are characterized. Molecular blood group typing is used in situations where erythrocytes are not available or where serological testing was inconclusive or not possible due to the lack of antisera. Also, molecular testing may be more cost-effective in certain situations. Molecular typing approaches are mostly based on either PCR with specific primers, DNA hybridization, or DNA sequencing. Particularly the transition of sequencing techniques from Sanger-based sequencing to next-generation sequencing (NGS) technologies has led to exciting new possibilities in blood group genotyping. We describe briefly the currently available NGS platforms and their specifications, depict the genetic background of blood group polymorphisms, and discuss applications for NGS approaches in immunohematology. As an example, we delineate a protocol for large-scale donor blood group screening established and in use at our institution. Furthermore, we discuss technical challenges and limitations as well as the prospect for future developments, including long-read sequencing technologies.
Collapse
Affiliation(s)
- Daniel Fürst
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Chrysanthi Tsamadou
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Christine Neuchel
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Hubert Schrezenmeier
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Joannis Mytilineos
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Christof Weinstock
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Transfusion Service, Baden Wuerttemberg/Hessen, and University Hospital Ulm, Ulm, Germany.,Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
| |
Collapse
|
5
|
Donahue KL, Broadley HJ, Elkinton JS, Burand JP, Huang W, Andersen JC. Using the
SSU
,
ITS
, and Ribosomal
DNA
Operon Arrangement to Characterize Two Microsporidia Infecting Bruce Spanworm,
Operophtera bruceata
(Lepidoptera: Geometridae). J Eukaryot Microbiol 2018; 66:424-434. [DOI: 10.1111/jeu.12685] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 08/03/2018] [Indexed: 01/17/2023]
Affiliation(s)
- Katelyn L. Donahue
- Biology Department University of Massachusetts Amherst Massachusetts 01003 USA
- Norris Cotton Cancer Center Geisel School of Medicine at Dartmouth Lebanon New Hampshire 03756 USA
| | - Hannah J. Broadley
- Graduate Program in Organismic and Evolutionary Biology University of Massachusetts Amherst Massachusetts 01003 USA
| | - Joseph S. Elkinton
- Graduate Program in Organismic and Evolutionary Biology University of Massachusetts Amherst Massachusetts 01003 USA
- Department of Environmental Conservation University of Massachusetts Amherst Massachusetts 01003 USA
| | - John P. Burand
- Microbiology Department University of Massachusetts Amherst Massachusetts 01003 USA
| | - Wei‐Fone Huang
- College of Bee Science Fujian Agriculture and Forestry University Fuzhou Fujian 350002 China
| | - Jeremy C. Andersen
- Department of Environmental Conservation University of Massachusetts Amherst Massachusetts 01003 USA
| |
Collapse
|
6
|
Balakirev ES, Chechetkin VR, Lobzin VV, Ayala FJ. Computational methods of identification of pseudogenes based on functionality: entropy and GC content. Methods Mol Biol 2014; 1167:41-62. [PMID: 24823770 DOI: 10.1007/978-1-4939-0835-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Spectral entropy and GC content analyses reveal comprehensive structural features of DNA sequences. To illustrate the significance of these features, we analyze the β-esterase gene cluster, including the Est-6 gene and the ψEst-6 putative pseudogene, in seven species of the Drosophila melanogaster subgroup. The spectral entropies show distinctly lower structural ordering for ψEst-6 than for Est-6 in all species studied. However, entropy accumulation is not a completely random process for either gene and it shows to be nucleotide dependent. Furthermore, GC content in synonymous positions is uniformly higher in Est-6 than in ψEst-6, in agreement with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of ψEst-6 and Est-6 after the duplication event. The data obtained show the relevance and significance of entropy and GC content analyses for pseudogene identification and for the comparative study of gene-pseudogene evolution.
Collapse
Affiliation(s)
- Evgeniy S Balakirev
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA,
| | | | | | | |
Collapse
|
7
|
Plastid trnF pseudogenes are present in Jaltomata, the sister genus of Solanum (Solanaceae): molecular evolution of tandemly repeated structural mutations. Gene 2013; 530:143-50. [PMID: 23962687 DOI: 10.1016/j.gene.2013.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 08/05/2013] [Accepted: 08/06/2013] [Indexed: 11/24/2022]
Abstract
Extensive gene duplication arranged in a tandem array is rare in the plastome of embryophytes. Interestingly, we found pseudogene copies of the trnF gene in the genus Jaltomata, the sister genus of Solanum where such gene duplication has been previously reported. In each Jaltomata sequence available we found two pseudogene copies in close 5'-proximity to the original functional gene. The size of each pseudogene copy ranged between 17 and 48 bp and the anticodon domain was identified as the most conserved element. A common ATT(G)n motif is particularly interesting and its modifications were found to border the 3' of the duplicated regions. Other motifs were partial residues, or entire parts of the T- and D-domains, and both domains proved to be variable in length among the pseudogenes identified. The residues of the 3' and 5' acceptor stem were not found among the copies. We further compared the newly discovered copies of Jaltomata with those ones previously described from Solanum and inferred phylogenetic relationships of the copies aligned. The evolution of Solanum copies, in contrast to Jaltomata, is hard to explain as resulting only in parsimonious changes since reticulate evolutionary patterns were detected among the copies. The dynamic evolutionary patterns of Solanum might be explained by possible inter- or intrachromosomal recombination.
Collapse
|
8
|
Abstract
Pseudogenes are ubiquitous and abundant in genomes. Pseudogenes were once called "genomic fossils" and treated as "junk DNA" several years. Nevertheless, it has been recognized that some pseudogenes play essential roles in gene regulation of their parent genes, and many pseudogenes are transcribed into RNA. Pseudogene transcripts may also form small interfering RNA or decrease cellular miRNA concentration. Thus, pseudogenes regulate tumor suppressors and oncogenes. Their essential functions draw the attention of our research group in my current work on heat shock protein 90: a chaperone of oncogenes. The paper reviews our current knowledge on pseudogenes and evaluates preliminary results of the chaperone data. Current efforts to understand pseudogenes interactions help to understand the functions of a genome.
Collapse
Affiliation(s)
- Yusuf Tutar
- Department of Biochemistry, Faculty of Medicine, Cumhuriyet University, 58140 Sivas, Turkey
- Department of Chemistry, Faculty of Science, Cumhuriyet University, 58140 Sivas, Turkey
- CUTFAM Research Center, Faculty of Medicine, Cumhuriyet University, 58140 Sivas, Turkey
| |
Collapse
|