301
|
Falola O, Adam Y, Ajayi O, Kumuthini J, Adewale S, Mosaku A, Samtal C, Adebayo G, Emmanuel J, Tchamga MSS, Erondu U, Nehemiah A, Rasaq S, Ajayi M, Akanle B, Oladipo O, Isewon I, Adebiyi M, Oyelade J, Adebiyi E. SysBiolPGWAS: simplifying post-GWAS analysis through the use of computational technologies and integration of diverse omics datasets. Bioinformatics 2023; 39:btac791. [PMID: 36477976 PMCID: PMC9825739 DOI: 10.1093/bioinformatics/btac791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Post-genome-wide association studies (pGWAS) analysis is designed to decipher the functional consequences of significant single-nucleotide polymorphisms (SNPs) in the era of GWAS. This can be translated into research insights and clinical benefits such as the effectiveness of strategies for disease screening, treatment and prevention. However, the setup of pGWAS (pGWAS) tools can be quite complicated, and it mostly requires big data. The challenge however is, scientists are required to have sufficient experience with several of these technically complex and complicated tools in order to complete the pGWAS analysis. RESULTS We present SysBiolPGWAS, a pGWAS web application that provides a comprehensive functionality for biologists and non-bioinformaticians to conduct several pGWAS analyses to overcome the above challenges. It provides unique functionalities for analysis involving multi-omics datasets and visualization using various bioinformatics tools. SysBiolPGWAS provides access to individual pGWAS tools and a novel custom pGWAS pipeline that integrates several individual pGWAS tools and data. The SysBiolPGWAS app was developed to be a one-stop shop for pGWAS analysis. It targets researchers in the area of the human genome and performs its analysis mainly in the autosomal chromosomes. AVAILABILITY AND IMPLEMENTATION SysBiolPGWAS web app was developed using JavaScript/TypeScript web frameworks and is available at: https://spgwas.waslitbre.org/. All codes are available in this GitHub repository https://github.com/covenant-university-bioinformatics.
Collapse
|
302
|
Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023; 19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built "maxATAC", a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC's capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Collapse
|
303
|
Shaw P, Blizzard S, Shastri G, Kundzicz P, Curtis B, Ungar L, Koehly L. A daily diary study into the effects on mental health of COVID-19 pandemic-related behaviors. Psychol Med 2023; 53:524-532. [PMID: 37132649 PMCID: PMC8326671 DOI: 10.1017/s0033291721001896] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/29/2021] [Accepted: 04/23/2021] [Indexed: 11/14/2022]
Abstract
BACKGROUND Recommendations for promoting mental health during the COVID-19 pandemic include maintaining social contact, through virtual rather than physical contact, moderating substance/alcohol use, and limiting news and media exposure. We seek to understand if these pandemic-related behaviors impact subsequent mental health. METHODS Daily online survey data were collected on adults during May/June 2020. Measures were of daily physical and virtual (online) contact with others; substance and media use; and indices of psychological striving, struggling and COVID-related worry. Using random-intercept cross-lagged panel analysis, dynamic within-person cross-lagged effects were separated from more static individual differences. RESULTS In total, 1148 participants completed daily surveys [657 (57.2%) females, 484 (42.1%) males; mean age 40.6 (s.d. 12.4) years]. Daily increases in news consumed increased COVID-related worrying the next day [cross-lagged estimate = 0.034 (95% CI 0.018-0.049), FDR-adjusted p = 0.00005] and vice versa [0.03 (0.012-0.048), FDR-adjusted p = 0.0017]. Increased media consumption also exacerbated subsequent psychological struggling [0.064 (0.03-0.098), FDR-adjusted p = 0.0005]. There were no significant cross-lagged effects of daily changes in social distancing or virtual contact on later mental health. CONCLUSIONS We delineate a cycle wherein a daily increase in media consumption results in a subsequent increase in COVID-related worries, which in turn increases daily media consumption. Moreover, the adverse impact of news extended to broader measures of psychological struggling. A similar dynamic did not unfold between the daily amount of physical or virtual contact and subsequent mental health. Findings are consistent with current recommendations to moderate news and media consumption in order to promote mental health.
Collapse
|
304
|
Dutta D, Sen A, Satagopan J. Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations. PLoS One 2022; 17:e0276886. [PMID: 36584096 PMCID: PMC9803132 DOI: 10.1371/journal.pone.0276886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. METHODS In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. RESULTS Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. CONCLUSIONS Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.
Collapse
|
305
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022; 12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Collapse
|
306
|
Farek J, Hughes D, Salerno W, Zhu Y, Pisupati A, Mansfield A, Krasheninina O, English AC, Metcalf G, Boerwinkle E, Muzny DM, Gibbs R, Khan Z, Sedlazeck FJ. xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments. Gigascience 2022; 12:giac125. [PMID: 36644891 PMCID: PMC9841152 DOI: 10.1093/gigascience/giac125] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 02/24/2022] [Accepted: 12/08/2022] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas.
Collapse
|
307
|
Rasche H, Hyde C, Davis J, Gladman S, Coraor N, Bretaudeau A, Cuccuru G, Bacon W, Serrano-Solano B, Hillman-Jackson J, Hiltemann S, Zhou M, Grüning B, Stubbs A. Training Infrastructure as a Service. Gigascience 2022; 12:giad048. [PMID: 37395629 PMCID: PMC10316688 DOI: 10.1093/gigascience/giad048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. FINDINGS Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. CONCLUSIONS TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.
Collapse
|
308
|
English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 2022; 23:271. [PMID: 36575487 PMCID: PMC9793516 DOI: 10.1186/s13059-022-02840-6] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022] Open
Abstract
The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongside technologies which produce ill-defined boundaries. As SV detection becomes more exact, algorithms to preserve this refined signal are needed. Here, we present Truvari-an SV comparison, annotation, and analysis toolkit-and demonstrate the effect of SV comparison choices by building population-level VCFs from 36 haplotype-resolved long-read assemblies. We observe over-merging from other SV merging approaches which cause up to a 2.2× inflation of allele frequency, relative to Truvari.
Collapse
|
309
|
Wei A, Wu H. Mammalian DNA methylome dynamics: mechanisms, functions and new frontiers. Development 2022; 149:dev182683. [PMID: 36519514 PMCID: PMC10108609 DOI: 10.1242/dev.182683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
DNA methylation is a highly conserved epigenetic modification that plays essential roles in mammalian gene regulation, genome stability and development. Despite being primarily considered a stable and heritable epigenetic silencing mechanism at heterochromatic and repetitive regions, whole genome methylome analysis reveals that DNA methylation can be highly cell-type specific and dynamic within proximal and distal gene regulatory elements during early embryonic development, stem cell differentiation and reprogramming, and tissue maturation. In this Review, we focus on the mechanisms and functions of regulated DNA methylation and demethylation, highlighting how these dynamics, together with crosstalk between DNA methylation and histone modifications at distinct regulatory regions, contribute to mammalian development and tissue maturation. We also discuss how recent technological advances in single-cell and long-read methylome sequencing, along with targeted epigenome-editing, are enabling unprecedented high-resolution and mechanistic dissection of DNA methylome dynamics.
Collapse
|
310
|
Mishra A, Ruano SH, Saha PK, Pennington KA. A novel model of gestational diabetes: Acute high fat high sugar diet results in insulin resistance and beta cell dysfunction during pregnancy in mice. PLoS One 2022; 17:e0279041. [PMID: 36520818 PMCID: PMC9754171 DOI: 10.1371/journal.pone.0279041] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Gestational diabetes mellitus (GDM) affects 7-18% of all pregnancies. Despite its high prevalence, there is no widely accepted animal model. To address this, we recently developed a mouse model of GDM. The goal of this work was to further characterize this animal model by assessing insulin resistance and beta cell function. Mice were randomly assigned to either control (CD) or high fat, high sugar (HFHS) diet and mated 1 week later. At day 0 (day of mating) mice were fasted and intraperitoneal insulin tolerance tests (ipITT) were performed. Mice were then euthanized and pancreata were collected for histological analysis. Euglycemic hyperinsulinemic clamp experiments were performed on day 13.5 of pregnancy to assess insulin resistance. Beta cell function was assessed by glucose stimulated insulin secretion (GSIS) assay performed on day 0, 13.5 and 17.5 of pregnancy. At day 0, insulin tolerance and beta cell numbers were not different. At day 13.5, glucose infusion and disposal rates were significantly decreased (p<0.05) in Pregnant (P) HFHS animals (p<0.05) suggesting development of insulin resistance in P HFHS dams. Placental and fetal glucose uptake was significantly increased (p<0.01) in P HFHS dams at day 13.5 of pregnancy and by day 17.5 of pregnancy fetal weights were increased (p<0.05) in P HFHS dams compared to P CD dams. Basal and secreted insulin levels were increased in HFHS fed females at day 0, however at day 13.5 and 17.5 GSIS was decreased (p<0.05) in P HFHS dams. In conclusion, this animal model results in insulin resistance and beta cell dysfunction by mid-pregnancy further validating its relevance in studying the pathophysiology GDM.
Collapse
|
311
|
Dashnow H, Pedersen BS, Hiatt L, Brown J, Beecroft SJ, Ravenscroft G, LaCroix AJ, Lamont P, Roxburgh RH, Rodrigues MJ, Davis M, Mefford HC, Laing NG, Quinlan AR. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol 2022; 23:257. [PMID: 36517892 PMCID: PMC9753380 DOI: 10.1186/s13059-022-02826-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Collapse
|
312
|
Salinas SA, Mace EM, Conte MI, Park CS, Li Y, Rosario-Sepulveda JI, Mahapatra S, Moore EK, Hernandez ER, Chinn IK, Reed AE, Lee BJ, Frumovitz A, Gibbs RA, Posey JE, Forbes Satter LR, Thatayatikom A, Allenspach EJ, Wensel TG, Lupski JR, Lacorazza HD, Orange JS. An ELF4 hypomorphic variant results in NK cell deficiency. JCI Insight 2022; 7:e155481. [PMID: 36477361 PMCID: PMC9746917 DOI: 10.1172/jci.insight.155481] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 10/13/2022] [Indexed: 12/12/2022] Open
Abstract
NK cell deficiencies (NKD) are a type of primary immune deficiency in which the major immunologic abnormality affects NK cell number, maturity, or function. Since NK cells contribute to immune defense against virally infected cells, patients with NKD experience higher susceptibility to chronic, recurrent, and fatal viral infections. An individual with recurrent viral infections and mild hypogammaglobulinemia was identified to have an X-linked damaging variant in the transcription factor gene ELF4. The variant does not decrease expression but disrupts ELF4 protein interactions and DNA binding, reducing transcriptional activation of target genes and selectively impairing ELF4 function. Corroborating previous murine models of ELF4 deficiency (Elf4-/-) and using a knockdown human NK cell line, we determined that ELF4 is necessary for normal NK cell development, terminal maturation, and function. Through characterization of the NK cells of the proband, expression of the proband's variant in Elf4-/- mouse hematopoietic precursor cells, and a human in vitro NK cell maturation model, we established this ELF4 variant as a potentially novel cause of NKD.
Collapse
|
313
|
Carrington B, Ramanagoudr-Bhojappa R, Bresciani E, Han TU, Sood R. A robust pipeline for efficient knock-in of point mutations and epitope tags in zebrafish using fluorescent PCR based screening. BMC Genomics 2022; 23:810. [PMID: 36476416 PMCID: PMC9730659 DOI: 10.1186/s12864-022-08971-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/26/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Genome editing using CRISPR/Cas9 has become a powerful tool in zebrafish to generate targeted gene knockouts models. However, its use for targeted knock-in remains challenging due to inefficient homology directed repair (HDR) pathway in zebrafish, highlighting the need for efficient and cost-effective screening methods. RESULTS: Here, we present our fluorescent PCR and capillary electrophoresis based screening approach for knock-in using a single-stranded oligodeoxynucleotide donor (ssODN) as a repair template for the targeted insertion of epitope tags, or single nucleotide changes to recapitulate pathogenic human alleles. For the insertion of epitope tags, we took advantage of the expected change in size of the PCR product. For point mutations, we combined fluorescent PCR with restriction fragment length polymorphism (RFLP) analysis to distinguish the fish with the knock-in allele. As a proof-of-principle, we present our data on the generation of fish lines with insertion of a FLAG tag at the tcnba locus, an HA tag at the gata2b locus, and a point mutation observed in Gaucher disease patients in the gba gene. Despite the low number of germline transmitting founders (1-5%), combining our screening methods with prioritization of founder fish by fin biopsies allowed us to establish stable knock-in lines by screening 12 or less fish per gene. CONCLUSIONS We have established a robust pipeline for the generation of zebrafish models with precise integration of small DNA sequences and point mutations at the desired sites in the genome. Our screening method is very efficient and easy to implement as it is PCR-based and only requires access to a capillary sequencer.
Collapse
|
314
|
Wu RR, Myers RA, Neuner J, McCarty C, Haller IV, Harry M, Fulda KG, Dimmock D, Rakhra-Burris T, Buchanan A, Ginsburg GS, Orlando LA. Implementation-effectiveness trial of systematic family health history based risk assessment and impact on clinical disease prevention and surveillance activities. BMC Health Serv Res 2022; 22:1486. [PMID: 36474257 PMCID: PMC9727967 DOI: 10.1186/s12913-022-08879-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Systematically assessing disease risk can improve population health by identifying those eligible for enhanced prevention/screening strategies. This study aims to determine the clinical impact of a systematic risk assessment in diverse primary care populations. METHODS Hybrid implementation-effectiveness trial of a family health history-based health risk assessment (HRA) tied to risk-based guideline recommendations enrolling from 2014-2017 with 12 months of post-intervention survey data and 24 months of electronic medical record (EMR) data capture. SETTING 19 primary care clinics at four geographically and culturally diverse U.S. healthcare systems. PARTICIPANTS any English or Spanish-speaking adult with an upcoming appointment at an enrolling clinic. METHODS A personal and family health history based HRA with integrated guideline-based clinical decision support (CDS) was completed by each participant prior to their appointment. Risk reports were provided to patients and providers to discuss at their clinical encounter. OUTCOMES provider and patient discussion and provider uptake (i.e. ordering) and patient uptake (i.e. recommendation completion) of CDS recommendations. MEASURES patient and provider surveys and EMR data. RESULTS One thousand eight hundred twenty nine participants (mean age 56.2 [SD13.9], 69.6% female) completed the HRA and had EMR data available for analysis. 762 (41.6%) received a recommendation (29.7% for genetic counseling (GC); 15.2% for enhanced breast/colon cancer screening). Those with recommendations frequently discussed disease risk with their provider (8.7%-38.2% varied by recommendation, p-values ≤ 0.004). In the GC subgroup, provider discussions increased referrals to counseling (44.4% with vs. 5.9% without, P < 0.001). Recommendation uptake was highest for colon cancer screening (provider = 67.9%; patient = 86.8%) and lowest for breast cancer chemoprevention (0%). CONCLUSIONS Systematic health risk assessment revealed that almost half the population were at increased disease risk based on guidelines. Risk identification resulted in shared discussions between participants and providers but variable clinical action uptake depending upon the recommendation. Understanding the barriers and facilitators to uptake by both patients and providers will be essential for optimizing HRA tools and achieving their promise of improving population health. TRIAL REGISTRATION Clinicaltrials.gov number NCT01956773 , registered 10/8/2013.
Collapse
|
315
|
Braschi B, Bruford EA, Cavanagh AT, Neuman SD, Bashirullah A. The bridge-like lipid transfer protein (BLTP) gene group: introducing new nomenclature based on structural homology indicating shared function. Hum Genomics 2022; 16:66. [PMID: 36461115 PMCID: PMC9719229 DOI: 10.1186/s40246-022-00439-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 11/22/2022] [Indexed: 12/03/2022] Open
Abstract
The HUGO Gene Nomenclature Committee assigns unique symbols and names to human genes. The use of approved nomenclature enables effective communication between researchers, and there are multiple examples of how the usage of unapproved alias symbols can lead to confusion. We discuss here a recent nomenclature update (May 2022) for a set of genes that encode proteins with a shared repeating β-groove domain. Some of the proteins encoded by genes in this group have already been shown to function as lipid transporters. By working with researchers in the field, we have been able to introduce a new root symbol (BLTP, which stands for "bridge-like lipid transfer protein") for this domain-based gene group. This new nomenclature not only reflects the shared domain in these proteins, but also takes into consideration the mounting evidence of a shared lipid transport function.
Collapse
|
316
|
Chen X, Chen L, Kürten CHL, Jabbari F, Vujanovic L, Ding Y, Lu B, Lu K, Kulkarni A, Tabib T, Lafyatis R, Cooper GF, Ferris R, Lu X. An individualized causal framework for learning intercellular communication networks that define microenvironments of individual tumors. PLoS Comput Biol 2022; 18:e1010761. [PMID: 36548438 PMCID: PMC9822106 DOI: 10.1371/journal.pcbi.1010761] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 01/06/2023] [Accepted: 11/26/2022] [Indexed: 12/24/2022] Open
Abstract
Cells within a tumor microenvironment (TME) dynamically communicate and influence each other's cellular states through an intercellular communication network (ICN). In cancers, intercellular communications underlie immune evasion mechanisms of individual tumors. We developed an individualized causal analysis framework for discovering tumor specific ICNs. Using head and neck squamous cell carcinoma (HNSCC) tumors as a testbed, we first mined single-cell RNA-sequencing data to discover gene expression modules (GEMs) that reflect the states of transcriptomic processes within tumor and stromal single cells. By deconvoluting bulk transcriptomes of HNSCC tumors profiled by The Cancer Genome Atlas (TCGA), we estimated the activation states of these transcriptomic processes in individual tumors. Finally, we applied individualized causal network learning to discover an ICN within each tumor. Our results show that cellular states of cells in TMEs are coordinated through ICNs that enable multi-way communications among epithelial, fibroblast, endothelial, and immune cells. Further analyses of individual ICNs revealed structural patterns that were shared across subsets of tumors, leading to the discovery of 4 different subtypes of networks that underlie disparate TMEs of HNSCC. Patients with distinct TMEs exhibited significantly different clinical outcomes. Our results show that the capability of estimating individual ICNs reveals heterogeneity of ICNs and sheds light on the importance of intercellular communication in impacting disease development and progression.
Collapse
|
317
|
Zhao Y, Brush M, Wang C, Wagner AH, Liu H, Freimuth RR. Leveraging a pharmacogenomics knowledgebase to formulate a drug response phenotype terminology for genomic medicine. Bioinformatics 2022; 38:5279-5287. [PMID: 36222570 PMCID: PMC9710557 DOI: 10.1093/bioinformatics/btac646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 05/31/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Despite the increasing evidence of utility of genomic medicine in clinical practice, systematically integrating genomic medicine information and knowledge into clinical systems with a high-level of consistency, scalability and computability remains challenging. A comprehensive terminology is required for relevant concepts and the associated knowledge model for representing relationships. In this study, we leveraged PharmGKB, a comprehensive pharmacogenomics (PGx) knowledgebase, to formulate a terminology for drug response phenotypes that can represent relationships between genetic variants and treatments. We evaluated coverage of the terminology through manual review of a randomly selected subset of 200 sentences extracted from genetic reports that contained concepts for 'Genes and Gene Products' and 'Treatments'. RESULTS Results showed that our proposed drug response phenotype terminology could cover 96% of the drug response phenotypes in genetic reports. Among 18 653 sentences that contained both 'Genes and Gene Products' and 'Treatments', 3011 sentences were able to be mapped to a drug response phenotype in our proposed terminology, among which the most discussed drug response phenotypes were response (994), sensitivity (829) and survival (332). In addition, we were able to re-analyze genetic report context incorporating the proposed terminology and enrich our previously proposed PGx knowledge model to reveal relationships between genetic variants and treatments. In conclusion, we proposed a drug response phenotype terminology that enhanced structured knowledge representation of genomic medicine. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
318
|
Jefferson C, Watson E, Certa JM, Gordon KS, Park LS, D’Souza G, Benning L, Abraham AG, Agil D, Napravnik S, Silverberg MJ, Leyden WA, Skarbinski J, Williams C, Althoff KN, Horberg MA. Differences in COVID-19 testing and adverse outcomes by race, ethnicity, sex, and health system setting in a large diverse US cohort. PLoS One 2022; 17:e0276742. [PMID: 36417366 PMCID: PMC9683575 DOI: 10.1371/journal.pone.0276742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 09/08/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Racial/ethnic disparities during the first six months of the COVID-19 pandemic led to differences in COVID-19 testing and adverse outcomes. We examine differences in testing and adverse outcomes by race/ethnicity and sex across a geographically diverse and system-based COVID-19 cohort collaboration. METHODS Observational study among adults (≥18 years) within six US cohorts from March 1, 2020 to August 31, 2020 using data from electronic health record and patient reporting. Race/ethnicity and sex as risk factors were primary exposures, with health system type (integrated health system, academic health system, or interval cohort) as secondary. Proportions measured SARS-CoV-2 testing and positivity; attributed hospitalization and death related to COVID-19. Relative risk ratios (RR) with 95% confidence intervals quantified associations between exposures and main outcomes. RESULTS 5,958,908 patients were included. Hispanic patients had the highest proportions of SARS-CoV-2 testing (16%) and positivity (18%), while Asian/Pacific Islander patients had the lowest portions tested (11%) and White patients had the lowest positivity rates (5%). Men had a lower likelihood of testing (RR = 0.90 [0.89-0.90]) and a higher positivity risk (RR = 1.16 [1.14-1.18]) compared to women. Black patients were more likely to have COVID-19-related hospitalizations (RR = 1.36 [1.28-1.44]) and death (RR = 1.17 [1.03-1.32]) compared with White patients. Men were more likely to be hospitalized (RR = 1.30 [1.16-1.22]) or die (RR = 1.70 [1.53-1.89]) compared to women. These racial/ethnic and sex differences were reflected in both health system types. CONCLUSIONS This study supports evidence of disparities by race/ethnicity and sex during the COVID-19 pandemic that persisted even in healthcare settings with reduced barriers to accessing care. Further research is needed to understand and prevent the drivers that resulted in higher burdens of morbidity among certain Black patients and men.
Collapse
|
319
|
Bowling KM, Thompson ML, Kelly MA, Scollon S, Slavotinek AM, Powell BC, Kirmse BM, Hendon LG, Brothers KB, Korf BR, Cooper GM, Greally JM, Hurst ACE. Return of non-ACMG recommended incidental genetic findings to pediatric patients: considerations and opportunities from experiences in genomic sequencing. Genome Med 2022; 14:131. [PMID: 36414972 PMCID: PMC9682742 DOI: 10.1186/s13073-022-01139-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 11/10/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The uptake of exome/genome sequencing has introduced unexpected testing results (incidental findings) that have become a major challenge for both testing laboratories and providers. While the American College of Medical Genetics and Genomics has outlined guidelines for laboratory management of clinically actionable secondary findings, debate remains as to whether incidental findings should be returned to patients, especially those representing pediatric populations. METHODS The Sequencing Analysis and Diagnostic Yield working group in the Clinical Sequencing Evidence-Generating Research Consortium has collected a cohort of pediatric patients found to harbor a genomic sequencing-identified non-ACMG-recommended incidental finding. The incidental variants were not thought to be associated with the indication for testing and were disclosed to patients and families. RESULTS In total, 23 "non-ACMG-recommended incidental findings were identified in 21 pediatric patients included in the study. These findings span four different research studies/laboratories and demonstrate differences in incidental finding return rate across study sites. We summarize specific cases to highlight core considerations that surround identification and return of incidental findings (uncertainty of disease onset, disease severity, age of onset, clinical actionability, and personal utility), and suggest that interpretation of incidental findings in pediatric patients can be difficult given evolving phenotypes. Furthermore, return of incidental findings can benefit patients and providers, but do present challenges. CONCLUSIONS While there may be considerable benefit to return of incidental genetic findings, these findings can be burdensome to providers and present risk to patients. It is important that laboratories conducting genomic testing establish internal guidelines in anticipation of detection. Moreover, cross-laboratory guidelines may aid in reducing the potential for policy heterogeneity across laboratories as it relates to incidental finding detection and return. However, future discussion is required to determine whether cohesive guidelines or policy statements are warranted.
Collapse
|
320
|
Ochieng J, Kwagala B, Barugahare J, Möller M, Moodley K. Feedback of individual genetic and genomics research results: A qualitative study involving grassroots communities in Uganda. PLoS One 2022; 17:e0267375. [PMID: 36399445 PMCID: PMC9674126 DOI: 10.1371/journal.pone.0267375] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 10/10/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Genetics and genomics research (GGR) is associated with several challenges including, but not limited to, methods and implications of sharing research findings with participants and their family members, issues of confidentiality, and ownership of data obtained from samples. Additionally, GGR holds significant potential risk for social and psychological harms. Considerable research has been conducted globally, and has advanced the debate on return of genetic and genomics testing results. However, such investigations are limited in the African setting, including Uganda where research ethics guidance on return of results is deficient or suboptimal at best. The objective of this study was to assess perceptions of grassroots communities on if and how feedback of individual genetics and genomics testing results should occur in Uganda with a view to improving ethics guidance. METHODS This was a cross-sectional study that employed a qualitative exploratory approach. Five deliberative focus group discussions (FGDs) were conducted with 42 participants from grassroots communities representing three major ethnic groupings. These were rural settings and the majority of participants were subsistence farmers with limited or no exposure to GGR. Data were analysed through thematic analysis, with both deductive and inductive approaches applied to interrogate predetermined themes and to identify any emerging themes. NVivo software (QSR international 2020) was used to support data analysis and illustrative quotes were extracted. RESULTS All the respondents were willing to participate in GGR and receive feedback of results conditional upon a health benefit. The main motivation was diagnostic and therapeutic benefits as well as facilitating future health planning. Thematic analysis identified four themes and several sub-themes including 1) the need-to-know health status 2) paternity information as a benefit and risk; 3) ethical considerations for feedback of findings and 4) extending feedback of genetics findings to family and community. CONCLUSION Participation in hypothetical GGR as well as feedback of results is acceptable to individuals in grassroots communities. However, the strong therapeutic and/or diagnostic misconception linked to GGR is concerning given that hopes for therapeutic and/or diagnostic benefit are unfounded. Viewing GGR as an opportunity to confirm or dispute paternity was another interesting perception. These findings carry profound implications for consent processes, genetic counselling and research ethics guidance. Privacy and confidentiality, benefits, risks as well as implications for sharing need to be considered for such feedback of results to be conducted appropriately.
Collapse
|
321
|
Rush CM, Blanchard Z, Polaski JT, Osborne KS, Osby K, Vahrenkamp JM, Yang CH, Lum DH, Hagan CR, Leslie KK, Pufall MA, Thiel KW, Gertz J. Characterization of HCI-EC-23 a novel estrogen- and progesterone-responsive endometrial cancer cell line. Sci Rep 2022; 12:19731. [PMID: 36396974 PMCID: PMC9672046 DOI: 10.1038/s41598-022-24211-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/11/2022] [Indexed: 11/18/2022] Open
Abstract
Most endometrial cancers express the hormone receptor estrogen receptor alpha (ER) and are driven by excess estrogen signaling. However, evaluation of the estrogen response in endometrial cancer cells has been limited by the availability of hormonally responsive in vitro models, with one cell line, Ishikawa, being used in most studies. Here, we describe a novel, adherent endometrioid endometrial cancer (EEC) cell line model, HCI-EC-23. We show that HCI-EC-23 retains ER expression and that ER functionally responds to estrogen induction over a range of passages. We also demonstrate that this cell line retains paradoxical activation of ER by tamoxifen, which is also observed in Ishikawa and is consistent with clinical data. The mutational landscape shows that HCI-EC-23 is mutated at many of the commonly altered genes in EEC, has relatively few copy-number alterations, and is microsatellite instable high (MSI-high). In vitro proliferation of HCI-EC-23 is strongly reduced upon combination estrogen and progesterone treatment. HCI-EC-23 exhibits strong estrogen dependence for tumor growth in vivo and tumor size is reduced by combination estrogen and progesterone treatment. Molecular characterization of estrogen induction in HCI-EC-23 revealed hundreds of estrogen-responsive genes that significantly overlapped with those regulated in Ishikawa. Analysis of ER genome binding identified similar patterns in HCI-EC-23 and Ishikawa, although ER exhibited more bound sites in Ishikawa. This study demonstrates that HCI-EC-23 is an estrogen- and progesterone-responsive cell line model that can be used to study the hormonal aspects of endometrial cancer.
Collapse
|
322
|
Sirén J, Paten B. GBZ file format for pangenome graphs. Bioinformatics 2022; 38:5012-5018. [PMID: 36179091 PMCID: PMC9665857 DOI: 10.1093/bioinformatics/btac656] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/06/2022] [Accepted: 09/30/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Pangenome graphs representing aligned genome assemblies are being shared in the text-based Graphical Fragment Assembly format. As the number of assemblies grows, there is a need for a file format that can store the highly repetitive data space efficiently. RESULTS We propose the GBZ file format based on data structures used in the Giraffe short-read aligner. The format provides good compression, and the files can be efficiently loaded into in-memory data structures. We provide compression and decompression tools and libraries for using GBZ graphs, and we show that they can be efficiently used on a variety of systems. AVAILABILITY AND IMPLEMENTATION C++ and Rust implementations are available at https://github.com/jltsiren/gbwtgraph and https://github.com/jltsiren/gbwt-rs, respectively. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
323
|
Liang J, Wang H, Cade BE, Kurniansyah N, He KY, Lee J, Sands SA, A. Brody J, Chen H, Gottlieb DJ, Evans DS, Guo X, Gharib SA, Hale L, Hillman DR, Lutsey PL, Mukherjee S, Ochs-Balcom HM, Palmer LJ, Purcell S, Saxena R, Patel SR, Stone KL, Tranah GJ, Boerwinkle E, Lin X, Liu Y, Psaty BM, Vasan RS, Manichaikul A, Rich SS, Rotter JI, Sofer T, Redline S, Zhu X. Targeted Genome Sequencing Identifies Multiple Rare Variants in Caveolin-1 Associated with Obstructive Sleep Apnea. Am J Respir Crit Care Med 2022; 206:1271-1280. [PMID: 35822943 PMCID: PMC9746833 DOI: 10.1164/rccm.202203-0618oc] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 07/06/2022] [Indexed: 01/04/2023] Open
Abstract
Rationale: Obstructive sleep apnea (OSA) is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. There is strong clinical and epidemiologic evidence supporting the importance of genetic factors influencing OSA but limited data implicating specific genes. Objectives: To search for rare variants contributing to OSA severity. Methods: Leveraging high-depth genomic sequencing data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and imputed genotype data from multiple population-based studies, we performed linkage analysis in the CFS (Cleveland Family Study), followed by multistage gene-based association analyses in independent cohorts for apnea-hypopnea index (AHI) in a total of 7,708 individuals of European ancestry. Measurements and Main Results: Linkage analysis in the CFS identified a suggestive linkage peak on chromosome 7q31 (LOD = 2.31). Gene-based analysis identified 21 noncoding rare variants in CAV1 (Caveolin-1) associated with lower AHI after accounting for multiple comparisons (P = 7.4 × 10-8). These noncoding variants together significantly contributed to the linkage evidence (P < 10-3). Follow-up analysis revealed significant associations between these variants and increased CAV1 expression, and increased CAV1 expression in peripheral monocytes was associated with lower AHI (P = 0.024) and higher minimum overnight oxygen saturation (P = 0.007). Conclusions: Rare variants in CAV1, a membrane-scaffolding protein essential in multiple cellular and metabolic functions, are associated with higher CAV1 gene expression and lower OSA severity, suggesting a novel target for modulating OSA severity.
Collapse
|
324
|
DeVries AA, Dennis J, Tyrer JP, Peng PC, Coetzee SG, Reyes AL, Plummer JT, Davis BD, Chen SS, Dezem FS, Aben KKH, Anton-Culver H, Antonenkova NN, Beckmann MW, Beeghly-Fadiel A, Berchuck A, Bogdanova NV, Bogdanova-Markov N, Brenton JD, Butzow R, Campbell I, Chang-Claude J, Chenevix-Trench G, Cook LS, DeFazio A, Doherty JA, Dörk T, Eccles DM, Eliassen AH, Fasching PA, Fortner RT, Giles GG, Goode EL, Goodman MT, Gronwald J, Håkansson N, Hildebrandt MAT, Huff C, Huntsman DG, Jensen A, Kar S, Karlan BY, Khusnutdinova EK, Kiemeney LA, Kjaer SK, Kupryjanczyk J, Labrie M, Lambrechts D, Le ND, Lubiński J, May T, Menon U, Milne RL, Modugno F, Monteiro AN, Moysich KB, Odunsi K, Olsson H, Pearce CL, Pejovic T, Ramus SJ, Riboli E, Riggan MJ, Romieu I, Sandler DP, Schildkraut JM, Setiawan VW, Sieh W, Song H, Sutphen R, Terry KL, Thompson PJ, Titus L, Tworoger SS, Van Nieuwenhuysen E, Edwards DV, Webb PM, Wentzensen N, Whittemore AS, Wolk A, Wu AH, Ziogas A, Freedman ML, Lawrenson K, Pharoah PDP, Easton DF, Gayther SA, Jones MR. Copy Number Variants Are Ovarian Cancer Risk Alleles at Known and Novel Risk Loci. J Natl Cancer Inst 2022; 114:1533-1544. [PMID: 36210504 PMCID: PMC9949586 DOI: 10.1093/jnci/djac160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 04/13/2022] [Accepted: 08/18/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Known risk alleles for epithelial ovarian cancer (EOC) account for approximately 40% of the heritability for EOC. Copy number variants (CNVs) have not been investigated as EOC risk alleles in a large population cohort. METHODS Single nucleotide polymorphism array data from 13 071 EOC cases and 17 306 controls of White European ancestry were used to identify CNVs associated with EOC risk using a rare admixture maximum likelihood test for gene burden and a by-probe ratio test. We performed enrichment analysis of CNVs at known EOC risk loci and functional biofeatures in ovarian cancer-related cell types. RESULTS We identified statistically significant risk associations with CNVs at known EOC risk genes; BRCA1 (PEOC = 1.60E-21; OREOC = 8.24), RAD51C (Phigh-grade serous ovarian cancer [HGSOC] = 5.5E-4; odds ratio [OR]HGSOC = 5.74 del), and BRCA2 (PHGSOC = 7.0E-4; ORHGSOC = 3.31 deletion). Four suggestive associations (P < .001) were identified for rare CNVs. Risk-associated CNVs were enriched (P < .05) at known EOC risk loci identified by genome-wide association study. Noncoding CNVs were enriched in active promoters and insulators in EOC-related cell types. CONCLUSIONS CNVs in BRCA1 have been previously reported in smaller studies, but their observed frequency in this large population-based cohort, along with the CNVs observed at BRCA2 and RAD51C gene loci in EOC cases, suggests that these CNVs are potentially pathogenic and may contribute to the spectrum of disease-causing mutations in these genes. CNVs are likely to occur in a wider set of susceptibility regions, with potential implications for clinical genetic testing and disease prevention.
Collapse
Grants
- P01 CA017054 NCI NIH HHS
- N01 CN025403 NCI NIH HHS
- UM1 CA176726 NCI NIH HHS
- R01 CA058860 NCI NIH HHS
- P50 CA105009 NCI NIH HHS
- R01-CA122443 NIH HHS
- 076113 Wellcome Trust
- G0401527 Medical Research Council
- U19-CA148112 NCI NIH HHS
- P50 CA136393 NCI NIH HHS
- C490/A10119 C490/A10124 Cancer Research UK
- 1000143 Medical Research Council
- R01-CA54419 NIH HHS
- C8221/A19170 Cancer Research UK
- R01 CA049449 NCI NIH HHS
- P50 CA159981 NCI NIH HHS
- T32 GM118288 NIGMS NIH HHS
- CA1X01HG007491-01 NIH HHS
- Z01-ES044005 NIEHS NIH HHS
- R01 CA106414 NCI NIH HHS
- R01 CA095023 NCI NIH HHS
- N01 PC067010 NCI NIH HHS
- R01 CA058598 NCI NIH HHS
- U01 CA176726 NCI NIH HHS
- S10 RR025141 NCRR NIH HHS
- M01 RR000056 NCRR NIH HHS
- Department of Health
- 5T32GM118288-03 NIH HHS
- MR/N003284/1 Medical Research Council
- P30 CA014089 NCI NIH HHS
- K07-CA080668 NCI NIH HHS
- 14136 Cancer Research UK
- Worldwide Cancer Research
- MR_UU_12023 Medical Research Council
- R01 CA067262 NCI NIH HHS
- UM1 CA186107 NCI NIH HHS
- P30 CA015083 NCI NIH HHS
- G1000143 Medical Research Council
- R01 CA076016 NCI NIH HHS
- NHGRI NIH HHS
- P01 CA087969 NCI NIH HHS
- R01- CA61107 NCI NIH HHS
- R01-CA58598 NIH HHS
- U19 CA148112 NCI NIH HHS
- ULTR000445 NCATS NIH HHS
- R03 CA115195 NCI NIH HHS
- Wellcome Trust
- Breast Cancer Now
- R01 CA160669 NCI NIH HHS
- R01-CA058860 NIH HHS
- MC_UU_00004/01 Medical Research Council
- C570/A16491 Cancer Research UK
- R01-CA76016 NIH HHS
- R01-CA106414-A2 NIH HHS
- 001 World Health Organization
- Z01 ES049033 Intramural NIH HHS
- R01 CA126841 NCI NIH HHS
- MR/M012190/1 Medical Research Council
- 209057 Wellcome Trust
- R03 CA113148 NCI NIH HHS
- R01 CA149429 NCI NIH HHS
- National Institute of General Medical Sciences
- National Institutes of Health
- CSMC Precision Health Initiative
- Tell Every Amazing Lady About Ovarian Cancer Louisa M. McGregor Ovarian Cancer Foundation
- Ovarian Cancer Research Fund thanks
- National Cancer Institute
- National Human Genome Research Institute
- Canadian Institutes of Health Research
- Ovarian Cancer Research Fund
- European Commission’s Seventh Framework Programme
- Army Medical Research and Materiel Command
- National Health & Medical Research Council of Australia
- Cancer Councils of New South Wales, Victoria, Queensland, South Australia and Tasmania and Cancer Foundation of Western Australia
- Ovarian Cancer Australia
- Peter MacCallum Foundation
- University of Erlangen-Nuremberg
- National Kankerplan
- Breast Cancer Now, Institute of Cancer Research
- National Center for Advancing Translational Sciences
- European Commission
- International Agency for Research on Cancer
- Danish Cancer Society
- Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale
- Institut National de la Santé et de la Recherche Médicale
- German Cancer Aid; German Cancer Research Center
- Federal Ministry of Education and Research
- Hellenic Health Foundation
- Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy
- National Research Council
- Dutch Ministry of Public Health, Welfare and Sports
- Netherlands Cancer Registry
- LK Research Funds
- Dutch Prevention Funds
- World Cancer Research Fund
- Nordforsk, Nordic Centre of Excellence programme on Food, Nutrition and Health
- Health Research Fund
- Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra
- Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten
- German Federal Ministry of Education and Research, Programme of Clinical Biomedical Research
- German Cancer Research Center
- Rudolf-Bartling Foundation
- Helsinki University Hospital Research Fund
- University of Pittsburgh School of Medicine Dean’s Faculty Advancement Award
- Department of Defense
- NCI
- Swedish Cancer Society, Swedish Research Council, Beta Kamprad Foundation
- Danish Cancer Society, Copenhagen
- Mayo Foundation
- Minnesota Ovarian Cancer Alliance
- Fred C. and Katherine B. Andersen Foundation
- VicHealth and Cancer Council Victoria, Cancer Council Victoria
- National Health and Medical Research Council of Australia
- NHMRC
- DOD Ovarian Cancer Research Program
- Moffitt Cancer Center
- Merck Pharmaceuticals
- Radboud University Medical Centre
- UK National Institute for Health Research Biomedical Research Centres at the University of Cambridge
- National Institute of Environmental Health Sciences
- The Swedish Cancer Foundation
- the Swedish Research Council
- American Cancer Society
- Celma Mastry Ovarian Cancer Foundation
- Lon V Smith Foundation
- The Eve Appeal
- National Institute for Health Research University College London Hospitals Biomedical Research Centre
- California Cancer Research Program
- National Science Centre
- NIH
Collapse
|
325
|
Cormier MJ, Pedersen BS, Bayrak-Toydemir P, Quinlan AR. Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies. BMC Bioinformatics 2022; 23:482. [PMID: 36376793 PMCID: PMC9664736 DOI: 10.1186/s12859-022-05041-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 11/07/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. RESULTS Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in "cryptic" splicing regions beyond canonical donor or acceptor splice sites. CONCLUSION Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.
Collapse
|