1
|
Zhou K, Gheybi K, Soh PXY, Hayes VM. Evaluating variant pathogenicity prediction tools to establish African inclusive guidelines for germline genetic testing. COMMUNICATIONS MEDICINE 2025; 5:157. [PMID: 40328947 PMCID: PMC12056225 DOI: 10.1038/s43856-025-00883-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 04/24/2025] [Indexed: 05/08/2025] Open
Abstract
BACKGROUND Genetic germline testing is restricted for African patients. Lack of ancestrally relevant genomic data perpetuated by African diversity has resulted in European-biased curated clinical variant databases and pathogenic prediction guidelines. While numerous variant pathogenicity prediction tools (VPPTs) exist, their performance has yet to be established within the context of African diversity. METHODS To address this limitation, we assessed 54 VPPTs for predictive performance (sensitivity, specificity, false positive and negative rates) across 145,291 known pathogenic or benign variants derived from 50 Southern African and 50 European men matched for advanced prostate cancer. Prioritising VPPTs for optimal ancestral performance, we screened 5.3 million variants of unknown significance for predicted functional and oncogenic potential. RESULTS We observe a 2.1- and 4.1-fold increase in the number of known and predicted rare pathogenic or benign variants, respectively, against a 1.6-fold decrease in the number of available interrogated variants in our European over African data. Although sensitivity was significantly lower for our African data overall (0.66 vs 0.71, p = 9.86E-06), MetaSVM, CADD, Eigen-raw, BayesDel-noAF, phyloP100way-vertebrate and MVP outperformed irrespective of ancestry. Conversely, MutationTaster, DANN, LRT and GERP-RS were African-specific top performers, while MutationAssessor, PROVEAN, LIST-S2 and REVEL are European-specific. Using these pathogenic prediction workflows, we narrow the ancestral gap for potentially deleterious and oncogenic variant prediction in favour of our African data by 1.15- and 1.1-fold, respectively. CONCLUSION Although VPPT sensitivity favours European data, our findings provide guidelines for VPPT selection to maximise rare pathogenic variant prediction for African disease studies.
Collapse
Affiliation(s)
- Kangping Zhou
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Kazzem Gheybi
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Pamela X Y Soh
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia
| | - Vanessa M Hayes
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, Sydney, NSW, Australia.
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK.
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
2
|
Jin F, Cheng N, Wang L, Ye B, Xia J. FDPSM: Feature-Driven Prediction Modeling of Pathogenic Synonymous Mutations. J Chem Inf Model 2025; 65:3064-3076. [PMID: 40082068 DOI: 10.1021/acs.jcim.4c02139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2025]
Abstract
Synonymous mutations, once considered to be biologically neutral, are now recognized to affect protein expression and function by altering the RNA splicing, stability, or translation efficiency. These effects can contribute to disease, making the prediction of the pathogenicity a crucial task. Computational methods have been developed to analyze the sequence features and biological functions of synonymous mutations, but existing methods face limitations, including scarcity of labeled data, reliance on other prediction tools, and insufficient representation of feature interrelationships. Here, we present FDPSM, a novel prediction method specifically designed to predict pathogenic synonymous mutations. FDPSM was trained on a robust data set of 4251 positive and negative training samples to enhance predictive accuracy. The method leveraged a comprehensive set of features, including genomic context, conservation, splicing effects, functional effects, and epigenomics, without relying on prediction scores from other mutation pathogenicity tools. Recognizing that original features alone may not fully capture the distinctions between pathogenic and benign synonymous mutations, we enhanced the feature set by extracting effective information from the interactions and distribution of these features. The experimental results showed that FDPSM significantly outperformed existing methods in predicting the pathogenicity of synonymous mutations, offering a more accurate and reliable tool for this important task. FDPSM is available at https://github.com/xialab-ahu/FDPSM.
Collapse
Affiliation(s)
- Fangfang Jin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Na Cheng
- School of Biomedical Engineering, Anhui Medical University, Hefei, Anhui 230032, China
| | - Lihua Wang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Information Engineering, Huangshan University, Huangshan, Anhui 245041, China
| | - Bin Ye
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
3
|
Liu X, Gu L, Hao C, Xu W, Leng F, Zhang P, Li W. Systematic assessment of structural variant annotation tools for genomic interpretation. Life Sci Alliance 2025; 8:e202402949. [PMID: 39658089 PMCID: PMC11632063 DOI: 10.26508/lsa.202402949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 12/12/2024] Open
Abstract
Structural variants (SVs) over 50 base pairs play a significant role in phenotypic diversity and are associated with various diseases, but their analysis is complex and resource-intensive. Numerous computational tools have been developed for SV prioritization, yet their effectiveness in biomedicine remains unclear. Here we benchmarked eight widely used SV prioritization tools, categorized into knowledge-driven (AnnotSV, ClassifyCNV) and data-driven (CADD-SV, dbCNV, StrVCTVRE, SVScore, TADA, XCNV) groups in accordance with the ACMG guidelines. We assessed their accuracy, robustness, and usability across diverse genomic contexts, biological mechanisms and computational efficiency using seven carefully curated independent datasets. Our results revealed that both groups of methods exhibit comparable effectiveness in predicting SV pathogenicity, although performance varies among tools, emphasizing the importance of selecting the appropriate tool based on specific research purposes. Furthermore, we pinpointed the potential improvement of expanding these tools for future applications. Our benchmarking framework provides a crucial evaluation method for SV analysis tools, offering practical guidance for biomedical research and facilitating the advancement of better genomic research tools.
Collapse
Affiliation(s)
- Xuanshi Liu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Lei Gu
- Epigenetics Laboratory, Max-Planck Institute for Heart and Lung Research, Cardiopulmonary Institute, Bad Nauheim, Germany
| | - Chanjuan Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Fei Leng
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
4
|
Chang S, Liu JJ, Zhao Y, Pang T, Zheng X, Song Z, Zhang A, Gao X, Luo L, Guo Y, Liu J, Yang L, Lu L. Whole-genome sequencing identifies novel genes for autism in Chinese trios. SCIENCE CHINA. LIFE SCIENCES 2024; 67:2368-2381. [PMID: 39126614 DOI: 10.1007/s11427-023-2564-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 03/16/2024] [Indexed: 08/12/2024]
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder with high genetic heritability but heterogeneity. Fully understanding its genetics requires whole-genome sequencing (WGS), but the ASD studies utilizing WGS data in Chinese population are limited. In this study, we present a WGS study for 334 individuals, including 112 ASD patients and their non-ASD parents. We identified 146 de novo variants in coding regions in 85 cases and 60 inherited variants in coding regions. By integrating these variants with an association model, we identified 33 potential risk genes (P<0.001) enriched in neuron and regulation related biological process. Besides the well-known ASD genes (SCN2A, NF1, SHANK3, CHD8 etc.), several high confidence genes were highlighted by a series of functional analyses, including CTNND1, DGKZ, LRP1, DDN, ZNF483, NR4A2, SMAD6, INTS1, and MRPL12, with more supported evidence from GO enrichment, expression and network analysis. We also integrated RNA-seq data to analyze the effect of the variants on the gene expression and found 12 genes in the individuals with the related variants had relatively biased expression. We further presented the clinical phenotypes of the proband carrying the risk genes in both our samples and Caucasian samples to show the effect of the risk genes on phenotype. Regarding variants in non-coding regions, a total of 74 de novo variants and 30 inherited variants were predicted as pathogenic with high confidence, which were mapped to specific genes or regulatory features. The number of de novo variants found in patient was significantly associated with the parents' ages at the birth of the child, and gender with trend. We also identified small de novo structural variants in ASD trios. The results in this study provided important evidence for understanding the genetic mechanism of ASD.
Collapse
Affiliation(s)
- Suhua Chang
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
- Chinese Academy of Medical Sciences Research Unit (No.2018RU006), Peking University, Beijing, 100191, China
| | - Jia Jia Liu
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
- School of Nursing, Peking University, Beijing, 100191, China
| | - Yilu Zhao
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | - Tao Pang
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | - Xiangyu Zheng
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | | | - Anyi Zhang
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | - Xuping Gao
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | - Lingxue Luo
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China
| | - Yanqing Guo
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China.
| | - Jing Liu
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China.
| | - Li Yang
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China.
| | - Lin Lu
- Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Peking University Sixth Hospital, Beijing, 100191, China.
- Chinese Academy of Medical Sciences Research Unit (No.2018RU006), Peking University, Beijing, 100191, China.
- National Institute on Drug Dependence, Peking University, Beijing, 100191, China.
| |
Collapse
|
5
|
Wang Y, Hon GC. Towards functional maps of non-coding variants in cancer. Front Genome Ed 2024; 6:1481443. [PMID: 39544254 PMCID: PMC11560456 DOI: 10.3389/fgeed.2024.1481443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 10/22/2024] [Indexed: 11/17/2024] Open
Abstract
Large scale cancer genomic studies in patients have unveiled millions of non-coding variants. While a handful have been shown to drive cancer development, the vast majority have unknown function. This review describes the challenges of functionally annotating non-coding cancer variants and understanding how they contribute to cancer. We summarize recently developed high-throughput technologies to address these challenges. Finally, we outline future prospects for non-coding cancer genetics to help catalyze personalized cancer therapy.
Collapse
Affiliation(s)
- Yihan Wang
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Gary C. Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States
- Division of Basic Reproductive Biology Research, Department of Obstetrics and Gynecology, Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, United States
| |
Collapse
|
6
|
Biddie SC, Weykopf G, Hird EF, Friman ET, Bickmore WA. DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants. Genome Biol 2024; 25:208. [PMID: 39107801 PMCID: PMC11304670 DOI: 10.1186/s13059-024-03352-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/25/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. RESULTS We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER-Functional SNV IdeNtification using DNase footprints and eRNA. CONCLUSIONS We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.
Collapse
Affiliation(s)
- Simon C Biddie
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
- NHS Lothian, Edinburgh, UK.
| | - Giovanna Weykopf
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | | | - Elias T Friman
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Wendy A Bickmore
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
7
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
8
|
Villani RM, McKenzie ME, Davidson AL, Spurdle AB. Regional-specific calibration enables application of computational evidence for clinical classification of 5' cis-regulatory variants in Mendelian disease. Am J Hum Genet 2024; 111:1301-1315. [PMID: 38815586 PMCID: PMC11267523 DOI: 10.1016/j.ajhg.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 06/01/2024] Open
Abstract
To date, clinical genetic testing for Mendelian disease variants has focused heavily on exonic coding and intronic gene regions. This multi-step study was undertaken to provide an evidence base for selecting and applying computational approaches for use in clinical classification of 5' cis-regulatory region variants. Curated datasets of clinically reported disease-causing 5' cis-regulatory region variants and variants from matched genomic regions in population controls were used to calibrate six bioinformatic tools as predictors of variant pathogenicity. Likelihood ratio estimates were aligned to code weights following ClinGen recommendations for application of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) classification scheme. Considering code assignment across all reference dataset variants, performance was best for CADD (81.2%) and REMM (81.5%). Optimized thresholds provided moderate evidence toward pathogenicity (CADD, REMM) and moderate (CADD) or supporting (REMM) evidence against pathogenicity. Both sensitivity and specificity of prediction were improved when further categorizing variants based on location in an EPDnew-defined promoter region. Combining predictions (CADD, REMM, and location in a promoter region) increased specificity at the expense of sensitivity. Importantly, the optimal CADD thresholds for assigning ACMG/AMP codes PP3 (≥10) and BP4 (≤8) were vastly different from recommendations for protein-coding variants (PP3 ≥25.3; BP4 ≤22.7); CADD <22.7 would incorrectly assign BP4 for >90% of reported disease-causing cis-regulatory region variants. Our results demonstrate the need to consider a tiered approach and tailored score thresholds to optimize bioinformatic impact prediction for clinical classification of 5' cis-regulatory region variants.
Collapse
Affiliation(s)
- Rehan M Villani
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Maddison E McKenzie
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Aimee L Davidson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia; University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
9
|
Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements. Cell Mol Life Sci 2024; 81:274. [PMID: 38902506 PMCID: PMC11335195 DOI: 10.1007/s00018-024-05314-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 06/06/2024] [Indexed: 06/22/2024]
Abstract
Discoveries in the field of genomics have revealed that non-coding genomic regions are not merely "junk DNA", but rather comprise critical elements involved in gene expression. These gene regulatory elements (GREs) include enhancers, insulators, silencers, and gene promoters. Notably, new evidence shows how mutations within these regions substantially influence gene expression programs, especially in the context of cancer. Advances in high-throughput sequencing technologies have accelerated the identification of somatic and germline single nucleotide mutations in non-coding genomic regions. This review provides an overview of somatic and germline non-coding single nucleotide alterations affecting transcription factor binding sites in GREs, specifically involved in cancer biology. It also summarizes the technologies available for exploring GREs and the challenges associated with studying and characterizing non-coding single nucleotide mutations. Understanding the role of GRE alterations in cancer is essential for improving diagnostic and prognostic capabilities in the precision medicine era, leading to enhanced patient-centered clinical outcomes.
Collapse
Affiliation(s)
- Sandra Iñiguez-Muñoz
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Pere Llinàs-Arias
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Miquel Ensenyat-Mendez
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Andrés F Bedoya-López
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Javier I J Orozco
- Saint John's Cancer Institute, Providence Saint John's Health Center, Santa Monica, CA, USA
| | - Javier Cortés
- International Breast Cancer Center (IBCC), Pangaea Oncology, Quiron Group, 08017, Barcelona, Spain
- Medica Scientia Innovation Research SL (MEDSIR), 08018, Barcelona, Spain
- Faculty of Biomedical and Health Sciences, Department of Medicine, Universidad Europea de Madrid, 28670, Madrid, Spain
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- University of Nottingham Biodiscovery Institute, Nottingham, UK
| | - Maggie L DiNome
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA
| | - Diego M Marzese
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain.
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
10
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
11
|
Yang M, Ali O, Bjørås M, Wang J. Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience 2023; 26:107266. [PMID: 37520692 PMCID: PMC10371843 DOI: 10.1016/j.isci.2023.107266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/05/2023] [Accepted: 06/28/2023] [Indexed: 08/01/2023] Open
Abstract
Millions of single nucleotide variants (SNVs) exist in the human genome; however, it remains challenging to identify functional SNVs associated with diseases. We propose a non-encoding SNVs analysis tool bpb3, BayesPI-BAR version 3, aiming to identify the functional mutation blocks (FMBs) by integrating genome sequencing and transcriptome data. The identified FMBs display high frequency SNVs, significant changes in transcription factors (TFs) binding affinity and are nearby the regulatory regions of differentially expressed genes. A two-level Bayesian approach with a biophysical model for protein-DNA interactions is implemented, to compute TF-DNA binding affinity changes based on clustered position weight matrices (PWMs) from over 1700 TF-motifs. The epigenetic data, such as the DNA methylome can also be integrated to scan FMBs. By testing the datasets from follicular lymphoma and melanoma, bpb3 automatically and robustly identifies FMBs, demonstrating that bpb3 can provide insight into patho-mechanisms, and therapeutic targets from transcriptomic and genomic data.
Collapse
Affiliation(s)
- Mingyi Yang
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Omer Ali
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
- Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway
| |
Collapse
|
12
|
Schubach M, Nazaretyan L, Kircher M. The Regulatory Mendelian Mutation score for GRCh38. Gigascience 2022; 12:giad024. [PMID: 37083939 PMCID: PMC10120424 DOI: 10.1093/gigascience/giad024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/10/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow. RESULTS Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup. CONCLUSIONS Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
13
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
14
|
Schipper M, Posthuma D. Demystifying non-coding GWAS variants: an overview of computational tools and methods. Hum Mol Genet 2022; 31:R73-R83. [PMID: 35972862 PMCID: PMC9585674 DOI: 10.1093/hmg/ddac198] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/11/2022] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association studies (GWAS) have found the majority of disease-associated variants to be non-coding. Major efforts into the charting of the non-coding regulatory landscapes have allowed for the development of tools and methods which aim to aid in the identification of causal variants and their mechanism of action. In this review, we give an overview of current tools and methods for the analysis of non-coding GWAS variants in disease. We provide a workflow that allows for the accumulation of in silico evidence to generate novel hypotheses on mechanisms underlying disease and prioritize targets for follow-up study using non-coding GWAS variants. Lastly, we discuss the need for comprehensive benchmarks and novel tools for the analysis of non-coding variants.
Collapse
Affiliation(s)
- Marijn Schipper
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105, Amsterdam 1081HV, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105, Amsterdam 1081HV, The Netherlands
| |
Collapse
|