Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018;137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]

For:	Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018;137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]

Number	Cited by Other Article(s)
1	SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations. Hum Mol Genet 2024;33:624-635. [PMID: 38129112 PMCID: PMC10954367 DOI: 10.1093/hmg/ddad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open Abstract Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA. Collapse Key Words TWAS eQTL Prediction functional annotations low-heritability Genes Collapse MESH Headings Humans Transcriptome/genetics Genome-Wide Association Study/methods Computer Simulation Quantitative Trait Loci/genetics Phenotype Polymorphism, Single Nucleotide Genetic Predisposition to Disease Collapse Grants NHLBI NIH HHS R03 AG070669 NIA NIH HHS NINDS NIH HHS NIDA NIH HHS NIMH NIH HHS R03 AG070669 NIH HHS NCI NIH HHS NHGRI NIH HHS National Institutes of Health UK Biobank recourse under Application National Cancer Institute National Human Genome Research Institute National Heart, Lung, and Blood Institute National Institute on Drug Abuse National Institute of Mental Health National Institute of Neurological Disorders and Stroke Collapse Affiliation(s) Collapse
2	NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024;51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023] Abstract The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants. Collapse Key Words Annotation Database Non-coding variants Variant interpretation Collapse MESH Headings Humans Molecular Sequence Annotation Gene Frequency Databases, Genetic Regulatory Sequences, Nucleic Acid/genetics Genetic Variation/genetics Collapse Grants Collapse Affiliation(s) Collapse
3	A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.30.564764. [PMID: 37961350 PMCID: PMC10634938 DOI: 10.1101/2023.10.30.564764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023] Abstract Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1. Collapse Key Words Collapse MESH Headings Collapse Grants R01 HL071250 NHLBI NIH HHS N01HC95164 NHLBI NIH HHS U01 HL054472 NHLBI NIH HHS R01 HL071025 NHLBI NIH HHS UL1 RR033176 NCRR NIH HHS R01 HL112064 NHLBI NIH HHS K26 DK138425 NIDDK NIH HHS 75N92020D00002 NHLBI NIH HHS HHSN268201500003C NHLBI NIH HHS R01 HL113323 NHLBI NIH HHS 75N92020D00005 NHLBI NIH HHS R01 HL104135 NHLBI NIH HHS R35 CA197449 NCI NIH HHS HHSN268201800012C NHLBI NIH HHS N01HC95160 NHLBI NIH HHS R01 HL071251 NHLBI NIH HHS R01 HL120393 NHLBI NIH HHS R01 HL087698 NHLBI NIH HHS U01 DK085524 NIDDK NIH HHS HHSN268201600002C NHLBI NIH HHS R01 HL071259 NHLBI NIH HHS U19 CA203654 NCI NIH HHS N01HC95163 NHLBI NIH HHS HHSN268201500001C NHLBI NIH HHS UL1 TR001079 NCATS NIH HHS U01 HG012064 NHGRI NIH HHS HHSN268201600018C NHLBI NIH HHS HHSN268201800014I NHLBI NIH HHS R01 HL087660 NHLBI NIH HHS R01 AR048797 NIAMS NIH HHS R01 HL092577 NHLBI NIH HHS N01HC95169 NHLBI NIH HHS U01 HL054509 NHLBI NIH HHS 75N92020D00001 NHLBI NIH HHS U01 HL120393 NHLBI NIH HHS R01 HL113338 NHLBI NIH HHS R01 DK117445 NIDDK NIH HHS R01 HL153805 NHLBI NIH HHS R01 AG058921 NIA NIH HHS R01 NS058700 NINDS NIH HHS R01 HL127564 NHLBI NIH HHS HHSN268201800014C NHLBI NIH HHS 75N92020D00003 NHLBI NIH HHS F32 HL085989 NHLBI NIH HHS R01 MH078111 NIMH NIH HHS N01HC95162 NHLBI NIH HHS U01 HL054464 NHLBI NIH HHS R01 HL119443 NHLBI NIH HHS R01 HL105756 NHLBI NIH HHS N01HC95168 NHLBI NIH HHS R01 HL067348 NHLBI NIH HHS R01 HL142711 NHLBI NIH HHS R35 HL135818 NHLBI NIH HHS U01 HL072524 NHLBI NIH HHS HHSN268201700002C NHLBI NIH HHS P30 DK063491 NIDDK NIH HHS R01 HL071051 NHLBI NIH HHS HHSN268201800001C NHLBI NIH HHS HHSN268201700001I NHLBI NIH HHS HHSN268201800013I NIMHD NIH HHS HHSN268201600003C NHLBI NIH HHS U01 HL054457 NHLBI NIH HHS HHSN268201700004I NHLBI NIH HHS N01HC95165 NHLBI NIH HHS N01HC95159 NHLBI NIH HHS HHSN268201500001I NHLBI NIH HHS HHSN268201800012I NHLBI NIH HHS M01 RR000052 NCRR NIH HHS N01HC95161 NHLBI NIH HHS UL1 TR001420 NCATS NIH HHS R01 HL049762 NHLBI NIH HHS 75N92020D00004 NHLBI NIH HHS HHSN268201600004C NHLBI NIH HHS P01 HL045522 NHLBI NIH HHS HHSN268201800011C NHLBI NIH HHS 75N92020D00007 NHLBI NIH HHS U01 HL072518 NHLBI NIH HHS U01 HL137162 NHLBI NIH HHS K99 HG012956 NHGRI NIH HHS M01 RR007122 NCRR NIH HHS HHSN268201500003I NHLBI NIH HHS HHSN268201600001C NHLBI NIH HHS R01 HL059684 NHLBI NIH HHS HHSN268201700005C NHLBI NIH HHS HHSN268201700003C NHLBI NIH HHS HHSN268201700001C NHLBI NIH HHS R01 MH078143 NIMH NIH HHS R01 DK071891 NIDDK NIH HHS N01HC95167 NHLBI NIH HHS N01HC25195 NHLBI NIH HHS HHSN268201800015I NHLBI NIH HHS R01 HL071205 NHLBI NIH HHS U01 HL054481 NHLBI NIH HHS 75N92019D00031 NHLBI NIH HHS R03 HL154284 NHLBI NIH HHS R01 MD012765 NIMHD NIH HHS HHSN268201700004C NHLBI NIH HHS UL1 TR000040 NCATS NIH HHS HHSN268201700002I NHLBI NIH HHS U01 HG009088 NHGRI NIH HHS R01 MH083824 NIMH NIH HHS HHSN268201800010I NHLBI NIH HHS HHSN268201700005I NHLBI NIH HHS R01 HL117626 NHLBI NIH HHS 75N92020D00006 NHLBI NIH HHS N01HC95166 NHLBI NIH HHS UL1 TR001881 NCATS NIH HHS HHSN268201800011I NHLBI NIH HHS HHSN268201700003I NHLBI NIH HHS U01 HL054495 NHLBI NIH HHS R01 HL071258 NHLBI NIH HHS R01 HL055673 NHLBI NIH HHS R01 HL092301 NHLBI NIH HHS U01 HL054473 NHLBI NIH HHS Collapse Affiliation(s) Collapse
4	Gene-based burden scores identify rare variant associations for 28 blood biomarkers. BMC Genom Data 2023;24:50. [PMID: 37667186 PMCID: PMC10476296 DOI: 10.1186/s12863-023-01155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 08/28/2023] [Indexed: 09/06/2023] Open Abstract BACKGROUND A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level. Collapse Key Words Blood biomarkers Complex phenotypes Gene associations Genetic prediction PRS Rare variants Collapse MESH Headings Humans Genetic Predisposition to Disease/genetics Biomarkers Phenotype Exome Multifactorial Inheritance/genetics Collapse Grants Deutsche Forschungsgemeinschaft Universitätsklinikum Bonn (8930) Collapse Affiliation(s) Collapse
5	The sequence kernel association test for multicategorical outcomes. Genet Epidemiol 2023;47:432-449. [PMID: 37078108 DOI: 10.1002/gepi.22527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 04/21/2023] Abstract Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC. Collapse Key Words SKAT multicategorical data the generalized logit model the proportional odds model Collapse MESH Headings Humans Female Genome-Wide Association Study Genetic Variation Models, Genetic Computer Simulation Breast Neoplasms/genetics Collapse Grants U24 OD023382 NIH HHS Collapse Affiliation(s) Collapse
6	RegVar: Tissue-specific Prioritization of Non-coding Regulatory Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:385-395. [PMID: 34973416 PMCID: PMC10626172 DOI: 10.1016/j.gpb.2021.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 06/11/2021] [Accepted: 09/27/2021] [Indexed: 06/14/2023] Abstract Non-coding genomic variants constitute the majority of trait-associated genome variations; however, the identification of functional non-coding variants is still a challenge in human genetics, and a method for systematically assessing the impact of regulatory variants on gene expression and linking these regulatory variants to potential target genes is still lacking. Here, we introduce a deep neural network (DNN)-based computational framework, RegVar, which can accurately predict the tissue-specific impact of non-coding regulatory variants on target genes. We show that by robustly learning the genomic characteristics of massive variant-gene expression associations in a variety of human tissues, RegVar vastly surpasses all current non-coding variant prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a web server at https://regvar.omic.tech/. Collapse Key Words Deep neural network Expression quantitative trait locus Expression regulation Non-coding variant Variant prioritization Collapse MESH Headings Humans Neural Networks, Computer Genomics Polymorphism, Single Nucleotide Genome-Wide Association Study Collapse Grants Collapse Affiliation(s) Collapse
7	Disease-associated non-coding variants alter NKX2-5 DNA-binding affinity. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2023;1866:194906. [PMID: 36690178 PMCID: PMC10013089 DOI: 10.1016/j.bbagrm.2023.194906] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 12/30/2022] [Accepted: 01/02/2023] [Indexed: 01/22/2023] Abstract Genome-wide association studies (GWAS) have mapped over 90 % of disease- or trait-associated variants within the non-coding genome, like cis-regulatory elements (CREs). Non-coding single nucleotide polymorphisms (SNPs) are genomic variants that can change how DNA-binding regulatory proteins, like transcription factors (TFs), interact with the genome and regulate gene expression. NKX2-5 is a TF essential for proper heart development, and mutations affecting its function have been associated with congenital heart diseases (CHDs). However, establishing a causal mechanism between non-coding genomic variants and human disease remains challenging. To address this challenge, we identified 8475 SNPs predicted to alter NKX2-5 DNA-binding using a position weight matrix (PWM)-based predictive model. Five variants were prioritized for in vitro validation; four of them are associated with traits and diseases that impact cardiovascular health. The impact of these variants on NKX2-5 binding was evaluated with electrophoretic mobility shift assay (EMSA) using purified recombinant NKX2-5 homeodomain. Binding curves were constructed to determine changes in binding between variant and reference alleles. Variants rs7350789, rs7719885, rs747334, and rs3892630 increased binding affinity, whereas rs61216514 decreased binding by NKX2-5 when compared to the reference genome. Our findings suggest that differential TF-DNA binding affinity can be key in establishing a causal mechanism of pathogenic variants. Collapse Key Words Binding affinity Gene regulation Non-coding variants Transcription factors Collapse MESH Headings Humans Genome-Wide Association Study Transcription Factors/genetics Transcription Factors/metabolism DNA-Binding Proteins/metabolism Regulatory Sequences, Nucleic Acid DNA/genetics Homeobox Protein Nkx-2.5/genetics Collapse Grants R25 GM061151 NIGMS NIH HHS R25 HG012702 NHGRI NIH HHS SC1 GM127231 NIGMS NIH HHS Collapse Affiliation(s) Collapse
8	TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions. Bioinformatics 2023;39:btad060. [PMID: 36707993 PMCID: PMC9900211 DOI: 10.1093/bioinformatics/btad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 01/20/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open Abstract MOTIVATION Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped. RESULTS By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the 'Whole Blood' tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TIVAN-indel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Humans Quantitative Trait Loci Epigenomics Regulatory Sequences, Nucleic Acid Chromatin INDEL Mutation Collapse Grants R35 GM138342 NIGMS NIH HHS R35 GM142701 NIGMS NIH HHS National Institutes of Health Collapse Affiliation(s) Collapse
9	FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Res 2023;51:D1300-D1311. [PMID: 36350676 PMCID: PMC9825437 DOI: 10.1093/nar/gkac966] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/25/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open Abstract Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org. Collapse Key Words Collapse MESH Headings Humans Genome, Human Software Molecular Sequence Annotation Genomics Genotype Genetic Variation Collapse Grants P42 ES030990 NIEHS NIH HHS R35 CA197449 NCI NIH HHS U01 HG012064 NHGRI NIH HHS R01 HL163560 NHLBI NIH HHS U19 CA203654 NCI NIH HHS U01 HG009088 NHGRI NIH HHS R03 OD030608 NIH HHS National Human Genome Research Institute National Cancer Institute National Heart, Lung, and Blood Institute Collapse Affiliation(s) Collapse
10	Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat Genet 2023;55:154-164. [PMID: 36564505 PMCID: PMC10084891 DOI: 10.1038/s41588-022-01225-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/13/2022] [Indexed: 12/24/2022] Abstract Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples. Collapse Key Words Collapse MESH Headings Genome-Wide Association Study/methods Whole Genome Sequencing/methods Exome Sequencing Phenotype Lipids/genetics Collapse Grants R01 HL120393 NHLBI NIH HHS U19 CA203654 NCI NIH HHS U01 HG012064 NHGRI NIH HHS HHSN268201800015I NHLBI NIH HHS R35 HL135824 NHLBI NIH HHS HHSN268201800011I NHLBI NIH HHS HHSN268201700003I NHLBI NIH HHS HHSN268201700001I NHLBI NIH HHS U01 HL137162 NHLBI NIH HHS HHSN268201600001C NHLBI NIH HHS HHSN268201600003C NHLBI NIH HHS MC_PC_17228 Medical Research Council R01 HL113323 NHLBI NIH HHS R35 CA197449 NCI NIH HHS R01 HL104135 NHLBI NIH HHS HHSN268201600002C NHLBI NIH HHS HHSN268201500001I NHLBI NIH HHS R01 HL125005 NHLBI NIH HHS U01 DK085524 NIDDK NIH HHS P50 HL118006 NHLBI NIH HHS U01 HL054509 NHLBI NIH HHS U01 HL120393 NHLBI NIH HHS R01 HL153805 NHLBI NIH HHS R01 AG058921 NIA NIH HHS R01 HL113338 NHLBI NIH HHS HHSN268201800012I NHLBI NIH HHS R01 NS058700 NINDS NIH HHS R01 HL127564 NHLBI NIH HHS HHSN268201600004C NHLBI NIH HHS R01 HL163560 NHLBI NIH HHS U01 HL137181 NHLBI NIH HHS R01 MH078111 NIMH NIH HHS HHSN268201700005I NHLBI NIH HHS HHSN268201500003I NHLBI NIH HHS HHSN268201700004I NHLBI NIH HHS R01 HL067348 NHLBI NIH HHS R01 HL142711 NHLBI NIH HHS R35 HL135818 NHLBI NIH HHS U01 HL072524 NHLBI NIH HHS K08 HL141601 NHLBI NIH HHS HHSN268201800010I NHLBI NIH HHS P01 HL045522 NHLBI NIH HHS R01 HL093093 NHLBI NIH HHS R01 DK071891 NIDDK NIH HHS HHSN268201600018C NHLBI NIH HHS N01HC25195 NHLBI NIH HHS R01 HL071205 NHLBI NIH HHS 75N92019D00031 NHLBI NIH HHS R03 HL154284 NHLBI NIH HHS HHSN268201700002I NHLBI NIH HHS T32 CA154274 NCI NIH HHS U01 HG009088 NHGRI NIH HHS P01 HL132825 NHLBI NIH HHS HHSN268201800013I NIMHD NIH HHS R01 HL055673 NHLBI NIH HHS R01 HL092301 NHLBI NIH HHS R03 OD030608 NIH HHS HHSN268201800014I NHLBI NIH HHS Collapse Affiliation(s) Collapse
11	A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nat Methods 2022;19:1599-1611. [PMID: 36303018 PMCID: PMC10008172 DOI: 10.1038/s41592-022-01640-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 09/06/2022] [Indexed: 02/07/2023] Abstract Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits. Collapse Key Words Collapse MESH Headings Humans Genome-Wide Association Study/methods Whole Genome Sequencing/methods Genome Phenotype Genetic Variation Collapse Grants R01 DK078616 NIDDK NIH HHS U01 HG007417 NHGRI NIH HHS KL2 TR001100 NCATS NIH HHS R01 HL112064 NHLBI NIH HHS N01-HC-95160 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R35 HG010692 NHGRI NIH HHS U01-HL054472 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01-HL142711 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01-DK071891 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) F30 HL149180 NHLBI NIH HHS R01 NR019628 NINR NIH HHS R01 HL113323 NHLBI NIH HHS N01-HC-95166 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) UL1RR033176 U.S. Department of Health & Human Services \| NIH \| National Center for Research Resources (NCRR) R01 HL132947 NHLBI NIH HHS P30 DK040561 NIDDK NIH HHS U01 HL137183 NHLBI NIH HHS R01-HL127564 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P30 CA016672 NCI NIH HHS R01-HL071051 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL104135 NHLBI NIH HHS T32 HL144442 NHLBI NIH HHS R35 CA197449 NCI NIH HHS P30 ES010126 NIEHS NIH HHS DP5 OD029586 NIH HHS R01-NS058700 U.S. Department of Health & Human Services \| NIH \| National Institute of Neurological Disorders and Stroke (NINDS) R01 HL123915 NHLBI NIH HHS R01 HL120393 NHLBI NIH HHS R01HL071259 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL046380 NHLBI NIH HHS R01HL071251, R01HL071258, R01HL071259 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U54 HG003067 NHGRI NIH HHS 75N92020D00003 NHLBI NIH HHS K01 AG059898 NIA NIH HHS U01 DK085524 NIDDK NIH HHS KL2 TR002542 NCATS NIH HHS R01-HL055673-18S1 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R03 HL141439 NHLBI NIH HHS HHSN268201500001I NHLBI NIH HHS R01-MH078143, R01-MH078111, R01-MH083824 U.S. Department of Health & Human Services \| NIH \| National Institute of Mental Health (NIMH) U01 DK062413 NIDDK NIH HHS R01 HL109946 NHLBI NIH HHS U01-HL054495 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) K01 HL136700 NHLBI NIH HHS U19 CA203654 NCI NIH HHS R01-DK078616 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) U01 HL080295 NHLBI NIH HHS NO1-HC-25195 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HG006703 NHGRI NIH HHS UL1-TR-001420 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) U01 HG012064 NHGRI NIH HHS R35-CA197449 U.S. Department of Health & Human Services \| NIH \| National Cancer Institute (NCI) P30 ES005605 NIEHS NIH HHS R01 AR042742 NIAMS NIH HHS R21 HL140385 NHLBI NIH HHS HHSN268201800015I NHLBI NIH HHS U01 HL130114 NHLBI NIH HHS R01 HL117191 NHLBI NIH HHS R01 HG009974 NHGRI NIH HHS U01-HL054473 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 DK113003 NIDDK NIH HHS UL1RR033176 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL059367 NHLBI NIH HHS R24 AG047115 NIA NIH HHS U01-HL137181 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P01 HL107202 NHLBI NIH HHS NR0224103 U.S. Department of Health & Human Services \| NIH \| National Institute of Nursing Research (NINR) P50 HL118006 NHLBI NIH HHS U01-HL72518, HL087698, HL49762, HL59684, HL58625, HL071025, HL112064 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U01 HL120393 NHLBI NIH HHS R01 DK117445 NIDDK NIH HHS R01-AG058921 U.S. Department of Health & Human Services \| NIH \| National Institute on Aging (U.S. National Institute on Aging) R03-HL154284 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) R01 AG058921 NIA NIH HHS R01 HL129132 NHLBI NIH HHS R01 HL113338 NHLBI NIH HHS HHSN268201800012I NHLBI NIH HHS R01 HL153805 NHLBI NIH HHS R01 DK072193 NIDDK NIH HHS R01 HL137922 NHLBI NIH HHS R01 AI079139 NIAID NIH HHS N01-HC-95164 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U01-DK085524 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) U19 AI111224 NIAID NIH HHS R35 HL135824 NHLBI NIH HHS 75N92019D00031 NHLBI NIH HHS R01 DK110113 NIDDK NIH HHS N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) N01-HC-95165 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL138737 NHLBI NIH HHS P30 DK079626 NIDDK NIH HHS R01 NS058700 NINDS NIH HHS R01 HL127564 NHLBI NIH HHS T32 HG000040 NHGRI NIH HHS DK063491 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) R01 HL141845 NHLBI NIH HHS R01 DK075787 NIDDK NIH HHS R01 AR072199 NIAMS NIH HHS R01 HL120854 NHLBI NIH HHS R01 HL163560 NHLBI NIH HHS R01HL071258 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U01-HG009088 U.S. Department of Health & Human Services \| NIH \| National Human Genome Research Institute (NHGRI) R01 HL163972 NHLBI NIH HHS K23 HL123778 NHLBI NIH HHS U01 HL137181 NHLBI NIH HHS R01 MH078111 NIMH NIH HHS HHSN268201700005I NHLBI NIH HHS N01-HC-95159 U.S. Department of Health & Human Services \| National Institutes of Health (NIH) R01-HL113323 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL141944 NHLBI NIH HHS R01 HL119443 NHLBI NIH HHS R01-HL071051, R01-HL071205, R01HL071250 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P60-AG10484 U.S. Department of Health & Human Services \| NIH \| National Institute on Aging (U.S. National Institute on Aging) 75N92020D00007 NHLBI NIH HHS UM1 AI068634 NIAID NIH HHS HHSN268201500003I NHLBI NIH HHS HHSN268201700004I NHLBI NIH HHS N01-HC-95163 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01-HL071205 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) F30 HL107066 NHLBI NIH HHS R01-HL153805 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL105756 NHLBI NIH HHS K01 HL125751 NHLBI NIH HHS R01 HL067348 NHLBI NIH HHS T32 HL007208 NHLBI NIH HHS R01 HL142711 NHLBI NIH HHS R35 HL135818 NHLBI NIH HHS R01-HL92301 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) T32 GM074897 NIGMS NIH HHS I01 BX005295 BLRD VA 75N92020D00001 NHLBI NIH HHS R01 HL113326 NHLBI NIH HHS R00 HL129045 NHLBI NIH HHS UL1-TR-000040 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) UL1-TR-001079 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) U01 HL072524 NHLBI NIH HHS R35-HL135818 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) K08 HL140203 NHLBI NIH HHS N01-HC-95162 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) K08 HL141601 NHLBI NIH HHS 75N92020D00005 NHLBI NIH HHS R01-DK117445 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) R01-AR48797 U.S. Department of Health & Human Services \| NIH \| National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) R56 AG058543 NIA NIH HHS U19 AI077439 NIAID NIH HHS R01 HL142028 NHLBI NIH HHS 75N92020D00004 NHLBI NIH HHS HHSN268201800011I NHLBI NIH HHS R35 GM127131 NIGMS NIH HHS U01 HL137880 NHLBI NIH HHS R01 HG010869 NHGRI NIH HHS R01-HL133040 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) HHSN268201700003I NHLBI NIH HHS R01HL071250 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) N01-HC-95168 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL148239 NHLBI NIH HHS U01-HL137162 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 AI132476 NIAID NIH HHS T32 GM007205 NIGMS NIH HHS HHSN268201800010I NHLBI NIH HHS R01-HL092577-06S1 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) UL1-TR-001881 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) R01-HL104135-04S1 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL132320 NHLBI NIH HHS U01 DK078616 NIDDK NIH HHS HHSN268201700001I NHLBI NIH HHS R01-HL141944 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U01 HL137162 NHLBI NIH HHS R01 HG005701 NHGRI NIH HHS 75N92020D00001, 75N92020D00002, 75N92020D00003, 75N92020D00004 U.S. Department of Health & Human Services \| National Institutes of Health (NIH) R01 HL143221 NHLBI NIH HHS R01 HL142992 NHLBI NIH HHS K01 HL129039 NHLBI NIH HHS R01 HL133870 NHLBI NIH HHS R01 DA037904 NIDA NIH HHS R21 HL123677 NHLBI NIH HHS R01 DK071891 NIDDK NIH HHS HHSN268201800001I U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) 75N92020D00002 NHLBI NIH HHS K01 HL130609 NHLBI NIH HHS N01-HC-95167 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) T32 HL007374 NHLBI NIH HHS N01-HC-95169 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U01-DK078616 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) R01 AR063611 NIAMS NIH HHS KL2TR002490 U.S. Department of Health & Human Services \| NIH \| National Center for Advancing Translational Sciences (NCATS) R03 HL154284 NHLBI NIH HHS M01-RR000052 U.S. Department of Health & Human Services \| NIH \| National Center for Research Resources (NCRR) 75N92020D00006 NHLBI NIH HHS S10 OD020069 NIH HHS R01 MD012765 NIMHD NIH HHS N01-HC-95161 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) HHSN268201700002I NHLBI NIH HHS R01 HL151855 NHLBI NIH HHS K23 HL138461 NHLBI NIH HHS U01 CA182913 NCI NIH HHS UG3 HL151865 NHLBI NIH HHS F32 HL150992 NHLBI NIH HHS R01-MD012765 U.S. Department of Health & Human Services \| NIH \| National Institute on Minority Health and Health Disparities (NIMHD) 75N92020D00005, 75N92020D00006, 75N92020D00007 U.S. Department of Health & Human Services \| National Institutes of Health (NIH) R01 MH101244 NIMH NIH HHS U01 HG009088 NHGRI NIH HHS N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P42 ES016454 NIEHS NIH HHS UM1 DK078616 NIDDK NIH HHS U01-HL054509 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R35-HL135824 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) M01-RR07122 U.S. Department of Health & Human Services \| NIH \| National Center for Research Resources (NCRR) U01 DK105561 NIDDK NIH HHS U01-HL072524 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P20 GM121334 NIGMS NIH HHS N01-HC-95167, N01-HC-95168, N01-HC-95169 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R01 HL131565 NHLBI NIH HHS R01HL071251 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) R13 CA124365 NCI NIH HHS R01-HL045522 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) P01 HL132825 NHLBI NIH HHS R01 HL118267 NHLBI NIH HHS HHSN268201800013I NIMHD NIH HHS R01-HL67348 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) U54 GM115428 NIGMS NIH HHS R01 HL055673 NHLBI NIH HHS HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) UM1-DK078616 U.S. Department of Health & Human Services \| NIH \| National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases) R01 HL149683 NHLBI NIH HHS R01 HL092301 NHLBI NIH HHS P30 DK020595 NIDDK NIH HHS R01 HL149836 NHLBI NIH HHS K08 HL145095 NHLBI NIH HHS K01 HL135405 NHLBI NIH HHS R03 OD030608 NIH HHS HHSN268201800014I NHLBI NIH HHS R01-HL113338 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) F32-HL085989 U.S. Department of Health & Human Services \| NIH \| National Heart, Lung, and Blood Institute (NHLBI) UM1 AI068636 NIAID NIH HHS R01 AG057381 NIA NIH HHS U19-CA203654 U.S. Department of Health & Human Services \| NIH \| National Cancer Institute (NCI) Collapse Affiliation(s) Collapse
12	The Role of Long Noncoding RNAs on Male Infertility: A Systematic Review and In Silico Analysis. BIOLOGY 2022;11:biology11101510. [PMID: 36290414 PMCID: PMC9598197 DOI: 10.3390/biology11101510] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/08/2022] [Accepted: 10/13/2022] [Indexed: 11/16/2022] Abstract Male infertility is a complex disorder affecting many couples worldwide. Long noncoding RNAs (lncRNAs) regulate important cellular processes; however, a comprehensive understanding of their role in male infertility is limited. This systematic review investigates the differential expressions of lncRNAs in male infertility or variations in lncRNA regions associated with it. The PRISMA guidelines were used to search Pubmed and Web of Science (1 June 2022). Inclusion criteria were human participants, patients diagnosed with male infertility, and English language speakers. We also performed an in silico analysis investigating lncRNAs that are reported in many subtypes of male infertility. A total of 625 articles were found, and after the screening and eligibility stages, 20 studies were included in the final sample. Many lncRNAs are deregulated in male infertility, and interactions between lncRNAs and miRNAs play an important role. However, there is a knowledge gap regarding the impact of variants found in lncRNA regions. Furthermore, eight lncRNAs were identified as differentially expressed in many subtypes of male infertility. After in silico analysis, gene ontology (GO) and KEGG enrichment analysis of the genes targeted by them revealed their association with bladder and prostate cancer. However, pathways involved in general in tumorigenesis and cancer development of all types, such as p53 pathways, apoptosis, and cell death, were also enriched, indicating a link between cancer and male infertility. This evidence, however, is preliminary. Future research is needed to explore the exact mechanism of action of the identified lncRNAs and investigate the association between male infertility and cancer. Collapse Key Words asthenozoospermia azoospermia cancer in silico long noncoding RNAs (lncRNAs) male infertility oligozoospermia teratozoospermia Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
13	Natural and Experimental Rewiring of Gene Regulatory Regions. Annu Rev Genomics Hum Genet 2022;23:73-97. [PMID: 35472292 DOI: 10.1146/annurev-genom-112921-010715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract The successful development and ongoing functioning of complex organisms depend on the faithful execution of the genetic code. A critical step in this process is the correct spatial and temporal expression of genes. The highly orchestrated transcription of genes is controlled primarily by cis-regulatory elements: promoters, enhancers, and insulators. The medical importance of this key biological process can be seen by the frequency with which mutations and inherited variants that alter cis-regulatory elements lead to monogenic and complex diseases and cancer. Here, we provide an overview of the methods available to characterize and perturb gene regulatory circuits. We then highlight mechanisms through which regulatory rewiring contributes to disease, and conclude with a perspective on how our understanding of gene regulation can be used to improve human health. Collapse Key Words CRISPR-Cas9 enhancers genome editing genome-wide association studies noncoding disease promoters Collapse MESH Headings Enhancer Elements, Genetic Gene Expression Regulation Gene Regulatory Networks Humans Mutation Promoter Regions, Genetic Collapse Grants MR/T014067/1 Medical Research Council MR/X001210/1 Medical Research Council 106130/Z/14/Z Wellcome Trust MR/N00969X/1 Medical Research Council MC_UU_00016/14 Medical Research Council Collapse Affiliation(s) Collapse
14	3DFAACTS-SNP: using regulatory T cell-specific epigenomics data to uncover candidate mechanisms of type 1 diabetes (T1D) risk. Epigenetics Chromatin 2022;15:24. [PMID: 35773720 PMCID: PMC9244893 DOI: 10.1186/s13072-022-00456-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 06/06/2022] [Indexed: 11/26/2022] Open Abstract Background Genome-wide association studies (GWAS) have enabled the discovery of single nucleotide polymorphisms (SNPs) that are significantly associated with many autoimmune diseases including type 1 diabetes (T1D). However, many of the identified variants lie in non-coding regions, limiting the identification of mechanisms that contribute to autoimmune disease progression. To address this problem, we developed a variant filtering workflow called 3DFAACTS-SNP to link genetic variants to target genes in a cell-specific manner. Here, we use 3DFAACTS-SNP to identify candidate SNPs and target genes associated with the loss of immune tolerance in regulatory T cells (Treg) in T1D. Results Using 3DFAACTS-SNP, we identified from a list of 1228 previously fine-mapped variants, 36 SNPs with plausible Treg-specific mechanisms of action. The integration of cell type-specific chromosome conformation capture data in 3DFAACTS-SNP identified 266 regulatory regions and 47 candidate target genes that interact with these variant-containing regions in Treg cells. We further demonstrated the utility of the workflow by applying it to three other SNP autoimmune datasets, identifying 16 Treg-centric candidate variants and 60 interacting genes. Finally, we demonstrate the broad utility of 3DFAACTS-SNP for functional annotation of all known common (> 10% allele frequency) variants from the Genome Aggregation Database (gnomAD). We identified 9376 candidate variants and 4968 candidate target genes, generating a list of potential sites for future T1D or other autoimmune disease research. Conclusions We demonstrate that it is possible to further prioritise variants that contribute to T1D based on regulatory function, and illustrate the power of using cell type-specific multi-omics datasets to determine disease mechanisms. Our workflow can be customised to any cell type for which the individual datasets for functional annotation have been generated, giving broad applicability and utility. Supplementary Information The online version contains supplementary material available at 10.1186/s13072-022-00456-5. Collapse Key Words Autoimmune disease Functional annotation Hi-C Regulatory T cells Transcription factor binding Type 1 diabetes Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
15	Whole exome sequencing identifies novel germline variants of SLC15A4 gene as potentially cancer predisposing in familial colorectal cancer. Mol Genet Genomics 2022;297:965-979. [PMID: 35562597 PMCID: PMC9250485 DOI: 10.1007/s00438-022-01896-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 04/02/2022] [Indexed: 11/25/2022] Abstract About 15% of colorectal cancer (CRC) patients have first-degree relatives affected by the same malignancy. However, for most families the cause of familial aggregation of CRC is unknown. To identify novel high-to-moderate-penetrance germline variants underlying CRC susceptibility, we performed whole exome sequencing (WES) on four CRC cases and two unaffected members of a Polish family without any mutation in known CRC predisposition genes. After WES, we used our in-house developed Familial Cancer Variant Prioritization Pipeline and identified two novel variants in the solute carrier family 15 member 4 (SLC15A4) gene. The heterozygous missense variant, p. Y444C, was predicted to affect the phylogenetically conserved PTR2/POT domain and to have a deleterious effect on the function of the encoded peptide/histidine transporter. The other variant was located in the upstream region of the same gene (GRCh37.p13, 12_129308531_C_T; 43 bp upstream of transcription start site, ENST00000266771.5) and it was annotated to affect the promoter region of SLC15A4 as well as binding sites of 17 different transcription factors. Our findings of two distinct variants in the same gene may indicate a synergistic up-regulation of SLC15A4 as the underlying genetic cause and implicate this gene for the first time in genetic inheritance of familial CRC. Collapse Key Words Familial colorectal cancer Germline variant SLC15A4 Whole exome sequencing Collapse MESH Headings Colorectal Neoplasms/genetics Colorectal Neoplasms/pathology Genetic Predisposition to Disease Germ Cells/pathology Germ-Line Mutation Humans Membrane Transport Proteins/genetics Nerve Tissue Proteins/genetics Pedigree Exome Sequencing Collapse Grants European Cooperation in Science and Technology Bundesministerium für Bildung und Forschung H2020 European Research Council Deutsches Krebsforschungszentrum (DKFZ) (1052) Collapse Affiliation(s) Collapse
16	A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. Am J Hum Genet 2022;109:446-456. [PMID: 35216679 PMCID: PMC8948160 DOI: 10.1016/j.ajhg.2022.01.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 01/26/2022] [Indexed: 12/26/2022] Open Abstract Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium. Collapse Key Words EM algorithm functional annotations generalized linear mixed model multi-dimensional integrated scores prediction of functional effect Collapse MESH Headings Genome, Human/genetics Genome-Wide Association Study/methods Genomics Humans Molecular Sequence Annotation Polymorphism, Single Nucleotide/genetics Probability Collapse Grants P42 ES030990 NIEHS NIH HHS U19 CA203654 NCI NIH HHS U01 HG012064 NHGRI NIH HHS RF1 AG072272 NIA NIH HHS R01 HL113338 NHLBI NIH HHS U01 HG009088 NHGRI NIH HHS R01 MH095797 NIMH NIH HHS R01 MH106910 NIMH NIH HHS R35 CA197449 NCI NIH HHS Collapse Affiliation(s) Collapse
17	OUP accepted manuscript. Nucleic Acids Res 2022;50:2522-2535. [PMID: 35234913 PMCID: PMC8934622 DOI: 10.1093/nar/gkac130] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/02/2022] [Accepted: 02/14/2022] [Indexed: 11/25/2022] Open Abstract Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses. Collapse Key Words Collapse MESH Headings Animals Base Sequence Mice Molecular Sequence Annotation Whole Genome Sequencing Collapse Grants R6-388 / WT 100127 Wellcome Trust 203141/Z/16/Z Wellcome Trust National Institute for Health Research Collapse Affiliation(s) Collapse
18	Advancing drug discovery using the power of the human genome. J Pathol 2021;254:418-429. [PMID: 33748968 PMCID: PMC8251523 DOI: 10.1002/path.5664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 03/11/2021] [Accepted: 03/16/2021] [Indexed: 12/31/2022] Abstract Human genetics plays an increasingly important role in drug development and population health. Here we review the history of human genetics in the context of accelerating the discovery of therapies, present examples of how human genetics evidence supports successful drug targets, and discuss how polygenic risk scores could be beneficial in various clinical settings. We highlight the value of direct-to-consumer platforms in the era of fast-paced big data biotechnology, and how diverse genetic and health data can benefit society. © 2021 23andMe, Inc. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland. Collapse Key Words GWAS direct-to-consumer drug development human genetics polygenic risk score precision medicine therapeutic discovery Collapse MESH Headings Drug Discovery Genome, Human Humans Collapse Grants Collapse Affiliation(s) Collapse
19	Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation. Nat Neurosci 2021;24:941-953. [PMID: 34017130 PMCID: PMC8254789 DOI: 10.1038/s41593-021-00858-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 04/15/2021] [Indexed: 02/03/2023] Abstract Common genetic risk for neuropsychiatric disorders is enriched in regulatory elements active during cortical neurogenesis. However, it remains poorly understood as to how these variants influence gene regulation. To model the functional impact of common genetic variation on the noncoding genome during human cortical development, we performed the assay for transposase accessible chromatin using sequencing (ATAC-seq) and analyzed chromatin accessibility quantitative trait loci (QTL) in cultured human neural progenitor cells and their differentiated neuronal progeny from 87 donors. We identified significant genetic effects on 988/1,839 neuron/progenitor regulatory elements, with highly cell-type and temporally specific effects. A subset (roughly 30%) of chromatin accessibility-QTL were also associated with changes in gene expression. Motif-disrupting alleles of transcriptional activators generally led to decreases in chromatin accessibility, whereas motif-disrupting alleles of repressors led to increases in chromatin accessibility. By integrating cell-type-specific chromatin accessibility-QTL and brain-relevant genome-wide association data, we were able to fine-map and identify regulatory mechanisms underlying noncoding neuropsychiatric disorder risk loci. Collapse Key Words Collapse MESH Headings Cell Differentiation/physiology Chromatin/genetics Gene Expression Regulation, Developmental/genetics Genetic Predisposition to Disease/genetics Genetic Variation/genetics Genome-Wide Association Study Humans Mental Disorders/genetics Neural Stem Cells/physiology Neurogenesis/genetics Neurons/physiology Quantitative Trait Loci/genetics Regulatory Elements, Transcriptional/genetics Transcription Factors/genetics Collapse Grants R01 MH110928 NIMH NIH HHS P30 ES010126 NIEHS NIH HHS R01 MH120125 NIMH NIH HHS R56 MH114901 NIMH NIH HHS U01 MH103365 NIMH NIH HHS R01 MH110905 NIMH NIH HHS R21 MH109956 NIMH NIH HHS P30 NS045892 NINDS NIH HHS U01 MH103339 NIMH NIH HHS R01 MH110920 NIMH NIH HHS U01 MH116487 NIMH NIH HHS R21 MH105881 NIMH NIH HHS R01 MH110921 NIMH NIH HHS R01 MH110926 NIMH NIH HHS U01 MH116488 NIMH NIH HHS U01 MH116438 NIMH NIH HHS U01 MH116442 NIMH NIH HHS R01 MH094714 NIMH NIH HHS R01 MH117292 NIMH NIH HHS R21 MH103877 NIMH NIH HHS U01 MH116489 NIMH NIH HHS P50 HD103573 NICHD NIH HHS R01 MH110927 NIMH NIH HHS P30 DK034987 NIDDK NIH HHS U01 MH116441 NIMH NIH HHS R56 MH114899 NIMH NIH HHS U01 MH103392 NIMH NIH HHS R01 MH117291 NIMH NIH HHS R01 MH117293 NIMH NIH HHS P50 CA058223 NCI NIH HHS R01 MH109677 NIMH NIH HHS U01 MH103346 NIMH NIH HHS R37 MH060233 NIMH NIH HHS U54 EB020403 NIBIB NIH HHS R56 MH114911 NIMH NIH HHS P30 AI028697 NIAID NIH HHS R00 MH102357 NIMH NIH HHS R01 MH111721 NIMH NIH HHS U01 MH103340 NIMH NIH HHS R21 MH102791 NIMH NIH HHS R01 MH105898 NIMH NIH HHS R01 MH118349 NIMH NIH HHS R21 MH105853 NIMH NIH HHS R01 MH109715 NIMH NIH HHS U01 MH116492 NIMH NIH HHS P50 MH106934 NIMH NIH HHS Collapse Affiliation(s) Collapse
20	Genetics of canine myxomatous mitral valve disease. Anim Genet 2021;52:409-421. [PMID: 34028063 DOI: 10.1111/age.13082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2021] [Indexed: 12/26/2022] Abstract Myxomatous mitral valve disease (MMVD) is the most common heart disease and cause of cardiac death in domestic dogs. MMVD is characterised by slow progressive myxomatous degeneration from the tips of the mitral valves onwards with subsequent mitral valve regurgitation, and left atrial and ventricular dilatation. Although the disease usually has a long asymptomatic period, in dogs with severe disease, mortality is typically secondary to left-sided congestive heart failure. Although it is not uncommon for dogs to survive long enough in the asymptomatic period to die from unrelated causes; a proportion of dogs rapidly advance into congestive heart failure. Heightened prevalence in certain breeds, such as the Cavalier King Charles Spaniel, has indicated that MMVD is under a genetic influence. The genetic characterisation of the factors that underlie the difference in progression of disease is of strong interest to those concerned with dog longevity and welfare. Advanced genomic technologies have the potential to provide information that may impact treatment, prevalence, or severity of MMVD through the elucidation of pathogenic mechanisms and the detection of predisposing genetic loci of major effect. Here we describe briefly the clinical nature of the disorder and consider the physiological mechanisms that might impact its occurrence in the domestic dog. Using results from comparative genomics we suggest possible genetic approaches for identifying genetic risk factors within breeds. The Cavalier King Charles Spaniel breed represents a robust resource for uncovering the genetic basis of MMVD. Collapse Key Words congestive heart failure dog endocardiosis genetics heart mitral valve myxomatous mitral valve disease Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
21	Pleiotropy and Cross-Disorder Genetics Among Psychiatric Disorders. Biol Psychiatry 2021;89:20-31. [PMID: 33131714 PMCID: PMC7898275 DOI: 10.1016/j.biopsych.2020.09.026] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 08/28/2020] [Accepted: 09/30/2020] [Indexed: 12/20/2022] Abstract Genome-wide analyses of common and rare genetic variations have documented the heritability of major psychiatric disorders, established their highly polygenic genetic architecture, and identified hundreds of contributing variants. In recent years, these studies have illuminated another key feature of the genetic basis of psychiatric disorders: the important role and pervasive nature of pleiotropy. It is now clear that a substantial fraction of genetic influences on psychopathology transcend clinical diagnostic boundaries. In this review, we summarize evidence in psychiatry for pleiotropy at multiple levels of analysis: from overall genome-wide correlation to biological pathways and down to the level of individual loci. We examine underlying mechanisms of observed pleiotropy, including genetic effects on neurodevelopment, diverse actions of regulatory elements, mediated effects, and spurious associations of genomic variation with multiple phenotypes. We conclude with an exploration of the implications of pleiotropy for understanding the genetic basis of psychiatric disorders, informing nosology, and advancing the aims of precision psychiatry and genomic medicine. Collapse Key Words Cross-disorder GWAS Genetic correlation Nosology Pleiotropy Precision psychiatry Psychiatric genetics Collapse MESH Headings Genetic Predisposition to Disease Genome-Wide Association Study Genomics Humans Mental Disorders/genetics Multifactorial Inheritance Phenotype Collapse Grants R00 MH101367 NIMH NIH HHS R01 MH118233 NIMH NIH HHS R01 MH119243 NIMH NIH HHS Collapse Affiliation(s) Collapse
22	Involvement of lncRNAs in celiac disease pathogenesis. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2020. [PMID: 33707056 DOI: 10.1016/bs.ircmb.2020.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2023] Abstract Celiac disease (CD) is an immune-mediated disease that develops in genetically susceptible individuals upon gluten exposure. Human Leukocyte Antigen (HLA) genes in the Major Histocompatibility Complex (MHC) have been described to represent the 40% of the genetic risk to develop CD. Aiming to gain understanding of the genetic involvement in CD, high throughput studies have been performed, revealing that many CD-associated variants are located in non-coding regions, hindering the study of the functional implications of these single nucleotide polymorphisms (SNPs). In the last decade, long non-coding RNAs (lncRNAs) have been described to be influenced by disease-associated SNPs and to drive many important mechanisms involved in the development of inflammatory diseases. Here we describe the lncRNAs identified and characterized in the context of celiac disease and highlight the importance of the study of these molecules in inflammatory and autoimmune disorders. Collapse Key Words Celiac disease Immune response Inflammation Long non-coding RNA SNP Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
23	Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 2020;52:969-983. [PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 07/02/2020] [Indexed: 12/13/2022] Abstract Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol. Collapse Key Words Collapse MESH Headings Cholesterol, LDL/genetics Computer Simulation Genetic Predisposition to Disease/genetics Genetic Variation/genetics Genome/genetics Genome-Wide Association Study/methods Humans Models, Genetic Molecular Sequence Annotation/methods Phenotype Whole Genome Sequencing/methods Collapse Grants R35 HL135824 NHLBI NIH HHS N01HC95164 NHLBI NIH HHS U01 HL054472 NHLBI NIH HHS R01 HL071025 NHLBI NIH HHS R01 HL112064 NHLBI NIH HHS 75N92020D00002 NHLBI NIH HHS HHSN268201500003C NHLBI NIH HHS R01 HL113323 NHLBI NIH HHS P30 CA016672 NCI NIH HHS 75N92020D00005 NHLBI NIH HHS R01 HL104135 NHLBI NIH HHS R35 CA197449 NCI NIH HHS P30 ES010126 NIEHS NIH HHS HHSN268201800012C NHLBI NIH HHS N01HC95160 NHLBI NIH HHS R01 HL133040 NHLBI NIH HHS R01 HL120393 NHLBI NIH HHS R01 HL087698 NHLBI NIH HHS R03 HL141439 NHLBI NIH HHS K01 AG059898 NIA NIH HHS U01 DK085524 NIDDK NIH HHS HHSN268201600002C NHLBI NIH HHS U19 CA203654 NCI NIH HHS N01HC95163 NHLBI NIH HHS HHSN268201500001C NHLBI NIH HHS UL1 TR001079 NCATS NIH HHS T32 GM135117 NIGMS NIH HHS HHSN268201600018C NHLBI NIH HHS HHSN268201800014I NHLBI NIH HHS U01 HL130114 NHLBI NIH HHS R01 AR048797 NIAMS NIH HHS R01 HL092577 NHLBI NIH HHS P30 ES000002 NIEHS NIH HHS P50 HL118006 NHLBI NIH HHS N01HC95169 NHLBI NIH HHS U01 HL054509 NHLBI NIH HHS 75N92020D00001 NHLBI NIH HHS R01 HL113338 NHLBI NIH HHS R01 AG058921 NIA NIH HHS R01 NS058700 NINDS NIH HHS T32 HG000040 NHGRI NIH HHS U01 HL137181 NHLBI NIH HHS HHSN268201800014C NHLBI NIH HHS N01HC95162 NHLBI NIH HHS 75N92020D00003 NHLBI NIH HHS F32 HL085989 NHLBI NIH HHS R01 MH078111 NIMH NIH HHS R01 HL119443 NHLBI NIH HHS R01 HL105756 NHLBI NIH HHS N01HC95168 NHLBI NIH HHS K01 HL125751 NHLBI NIH HHS R01 HL067348 NHLBI NIH HHS R01 HL142711 NHLBI NIH HHS R35 HL135818 NHLBI NIH HHS T32 GM074897 NIGMS NIH HHS P30 DK063491 NIDDK NIH HHS U01 HL072524 NHLBI NIH HHS HHSN268201700002C NHLBI NIH HHS K08 HL141601 NHLBI NIH HHS HHSN268201800001C NHLBI NIH HHS HHSN268201700001I NHLBI NIH HHS HHSN268201800013I NIMHD NIH HHS HHSN268201600003C NHLBI NIH HHS HHSN268201700004I NHLBI NIH HHS N01HC95165 NHLBI NIH HHS R35 GM127131 NIGMS NIH HHS N01HC95159 NHLBI NIH HHS HHSN268201500001I NHLBI NIH HHS HHSN268201800012I NHLBI NIH HHS M01 RR000052 NCRR NIH HHS N01HC95161 NHLBI NIH HHS UL1 TR001420 NCATS NIH HHS R01 HL049762 NHLBI NIH HHS 75N92020D00004 NHLBI NIH HHS HHSN268201600004C NHLBI NIH HHS P01 HL045522 NHLBI NIH HHS HHSN268201800011C NHLBI NIH HHS 75N92020D00007 NHLBI NIH HHS U01 HL072518 NHLBI NIH HHS U01 HL137162 NHLBI NIH HHS M01 RR007122 NCRR NIH HHS HHSN268201500003I NHLBI NIH HHS HHSN268201600001C NHLBI NIH HHS R01 HL059684 NHLBI NIH HHS R01 HL093093 NHLBI NIH HHS HHSN268201700005C NHLBI NIH HHS U24 HG009446 NHGRI NIH HHS HHSN268201700003C NHLBI NIH HHS HHSN268201700001C NHLBI NIH HHS R01 MH078143 NIMH NIH HHS R01 DK071891 NIDDK NIH HHS N01HC95167 NHLBI NIH HHS U24 CA237617 NCI NIH HHS N01HC25195 NHLBI NIH HHS HHSN268201800015I NHLBI NIH HHS 75N92019D00031 NHLBI NIH HHS HHSN268201700004C NHLBI NIH HHS UL1 TR000040 NCATS NIH HHS P01 CA134294 NCI NIH HHS HHSN268201700002I NHLBI NIH HHS R01 MH101244 NIMH NIH HHS R01 MH083824 NIMH NIH HHS HHSN268201800010I NHLBI NIH HHS HHSN268201700005I NHLBI NIH HHS U01 HG009088 NHGRI NIH HHS P42 ES016454 NIEHS NIH HHS R01 HL117626 NHLBI NIH HHS 75N92020D00006 NHLBI NIH HHS N01HC95166 NHLBI NIH HHS UL1 TR001881 NCATS NIH HHS HHSN268201800011I NHLBI NIH HHS R13 CA124365 NCI NIH HHS R01 HL134320 NHLBI NIH HHS HHSN268201700003I NHLBI NIH HHS U01 HL054495 NHLBI NIH HHS R01 HL055673 NHLBI NIH HHS R01 HL092301 NHLBI NIH HHS U01 HL054473 NHLBI NIH HHS Collapse Affiliation(s) Collapse
24	Accuracy of a machine learning muscle MRI-based tool for the diagnosis of muscular dystrophies. Neurology 2020;94:e1094-e1102. [PMID: 32029545 DOI: 10.1212/wnl.0000000000009068] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/03/2019] [Indexed: 12/11/2022] Open Abstract OBJECTIVE Genetic diagnosis of muscular dystrophies (MDs) has classically been guided by clinical presentation, muscle biopsy, and muscle MRI data. Muscle MRI suggests diagnosis based on the pattern of muscle fatty replacement. However, patterns overlap between different disorders and knowledge about disease-specific patterns is limited. Our aim was to develop a software-based tool that can recognize muscle MRI patterns and thus aid diagnosis of MDs. METHODS We collected 976 pelvic and lower limbs T1-weighted muscle MRIs from 10 different MDs. Fatty replacement was quantified using Mercuri score and files containing the numeric data were generated. Random forest supervised machine learning was applied to develop a model useful to identify the correct diagnosis. Two thousand different models were generated and the one with highest accuracy was selected. A new set of 20 MRIs was used to test the accuracy of the model, and the results were compared with diagnoses proposed by 4 specialists in the field. RESULTS A total of 976 lower limbs MRIs from 10 different MDs were used. The best model obtained had 95.7% accuracy, with 92.1% sensitivity and 99.4% specificity. When compared with experts on the field, the diagnostic accuracy of the model generated was significantly higher in a new set of 20 MRIs. CONCLUSION Machine learning can help doctors in the diagnosis of muscle dystrophies by analyzing patterns of muscle fatty replacement in muscle MRI. This tool can be helpful in daily clinics and in the interpretation of the results of next-generation sequencing tests. CLASSIFICATION OF EVIDENCE This study provides Class II evidence that a muscle MRI-based artificial intelligence tool accurately diagnoses muscular dystrophies. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
25	GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits. Nucleic Acids Res 2019;46:W114-W120. [PMID: 29771388 PMCID: PMC6030885 DOI: 10.1093/nar/gky407] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 05/03/2018] [Indexed: 01/04/2023] Open Abstract Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
26	Association between MBL2 haplotypes and dengue severity in children from Rio de Janeiro, Brazil. Mem Inst Oswaldo Cruz 2019;114:e190004. [PMID: 31141020 PMCID: PMC6534340 DOI: 10.1590/0074-02760190004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 04/11/2019] [Indexed: 12/21/2022] Open Abstract BACKGROUND Dengue is an arthropod-borne viral disease with a majority of asymptomatic individuals and clinical manifestations varying from mild fever to severe and potentially lethal forms. An increasing number of genetic studies have outlined the association between host genetic variations and dengue severity. Genes associated to viral recognition and entry, as well as those encoding mediators of the immune response against infection are strong candidates for association studies. OBJECTIVES The aim of this study was to investigate the association between MBL2, CLEC5A, ITGB3 and CCR5 genes and dengue severity in children. METHODS A matched case-control study was conducted and 19 single nucleotide polymorphisms (SNPs) were investigated. FINDINGS No associations were observed in single SNP analysis. However, when MBL2 SNPs were combined in haplotypes, the allele rs7095891G/rs1800450C/ rs1800451C/rs4935047A/rs930509G/rs2120131G/rs2099902C was significantly associated to risk of severe dengue under α = 0.05 (aOR = 4.02; p = 0.02). A second haplotype carrying rs4935047G and rs7095891G alleles was also associated to risk (aOR = 1.91; p = 0.04). MAIN CONCLUSIONS This is the first study to demonstrate the association between MBL2 haplotypes and dengue severity in Brazilians including adjustment for genetic ancestry. These results reinforce the role of mannose binding lectin in immune response to DENV. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
27	SSS-test: a novel test for detecting positive selection on RNA secondary structure. BMC Bioinformatics 2019;20:151. [PMID: 30898084 PMCID: PMC6429701 DOI: 10.1186/s12859-019-2711-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/03/2019] [Indexed: 12/23/2022] Open Abstract Background Long non-coding RNAs (lncRNAs) play an important role in regulating gene expression and are thus important for determining phenotypes. Most attempts to measure selection in lncRNAs have focused on the primary sequence. The majority of small RNAs and at least some parts of lncRNAs must fold into specific structures to perform their biological function. Comprehensive assessments of selection acting on RNAs therefore must also encompass structure. Selection pressures acting on the structure of non-coding genes can be detected within multiple sequence alignments. Approaches of this type, however, have so far focused on negative selection. Thus, a computational method for identifying ncRNAs under positive selection is needed. Results We introduce the SSS-test (test for Selection on Secondary Structure) to identify positive selection and thus adaptive evolution. Benchmarks with biological as well as synthetic controls yield coherent signals for both negative and positive selection, demonstrating the functionality of the test. A survey of a lncRNA collection comprising 15,443 families resulted in 110 candidates that appear to be under positive selection in human. In 26 lncRNAs that have been associated with psychiatric disorders we identified local structures that have signs of positive selection in the human lineage. Conclusions It is feasible to assay positive selection acting on RNA secondary structures on a genome-wide scale. The detection of human-specific positive selection in lncRNAs associated with cognitive disorder provides a set of candidate genes for further experimental testing and may provide insights into the evolution of cognitive abilities in humans. Availability The SSS-test and related software is available at: https://github.com/waltercostamb/SSS-test. The databases used in this work are available at: http://www.bioinf.uni-leipzig.de/Software/SSS-test/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2711-y) contains supplementary material, which is available to authorized users. Collapse Key Words Long non-coding RNA Positive selection Primate genomes Psychiatric disorders RNA secondary structure Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
28	Estimating contribution of rare non-coding variants to neuropsychiatric disorders. Psychiatry Clin Neurosci 2019;73:2-10. [PMID: 30293238 DOI: 10.1111/pcn.12774] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/24/2018] [Indexed: 12/21/2022] Abstract Owing to recent advances in DNA sequencing technology, a number of large-scale comprehensive analyses of genetic variations in protein-coding regions (i.e., whole-exome sequencing studies), have been conducted for neuropsychiatric and neurodevelopmental disorders, such as autism spectrum disorders, intellectual disability, and schizophrenia. These studies, especially those focusing on de novo (newly arising) mutations and extremely rare variants, have successfully identified previously unrecognized disease genes/mutations with a large effect size and deepen our understanding of the biology of neuropsychiatric diseases. Along with the continuously dropping sequencing cost, now the target of sequencing studies is expanding from the exome to the whole human genome. Several pioneering works have provided important insights into the contribution of rare non-coding variants to neuropsychiatric diseases. At the same time, these studies highlight need for further larger sample sizes and improvement in annotation of non-coding regulatory variants. In this review, key findings from recent studies as well as likely future directions are overviewed. Collapse Key Words autism de novo exome regulatory element schizophrenia Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
29	Functional implication of celiac disease associated lncRNAs in disease pathogenesis. Comput Biol Med 2018;102:369-375. [DOI: 10.1016/j.compbiomed.2018.08.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 08/09/2018] [Accepted: 08/09/2018] [Indexed: 12/11/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse