1
|
Putative protective genomic variation in the Lithuanian population. Genet Mol Biol 2024; 47:e20230030. [PMID: 38626572 PMCID: PMC11021042 DOI: 10.1590/1678-4685-gmb-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/01/2024] [Indexed: 04/18/2024] Open
Abstract
Genomic effect variants associated with survival and protection against complex diseases vary between populations due to microevolutionary processes. The aim of this study was to analyse diversity and distribution of effect variants in a context of potential positive selection. In total, 475 individuals of Lithuanian origin were genotyped using high-throughput scanning and/or sequencing technologies. Allele frequency analysis for the pre-selected effect variants was performed using the catalogue of single nucleotide polymorphisms. Comparison of the pre-selected effect variants with variants in primate species was carried out to ascertain which allele was derived and potentially of protective nature. Recent positive selection analysis was performed to verify this protective effect. Four variants having significantly different frequencies compared to European populations were identified while two other variants reached borderline significance. Effect variant in SLC30A8 gene may potentially protect against type 2 diabetes. The existing paradox of high rates of type 2 diabetes in the Lithuanian population and the relatively high frequencies of potentially protective genome variants against it indicate a lack of knowledge about the interactions between environmental factors, regulatory regions, and other genome variation. Identification of effect variants is a step towards better understanding of the microevolutionary processes, etiopathogenetic mechanisms, and personalised medicine.
Collapse
|
2
|
Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
|
3
|
Disentangling archaic introgression and genomic signatures of selection at human immunity genes. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 116:105528. [PMID: 37977419 DOI: 10.1016/j.meegid.2023.105528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/04/2023] [Accepted: 11/14/2023] [Indexed: 11/19/2023]
Abstract
Pathogens and infectious diseases have imposed exceptionally strong selective pressure on ancient and modern human genomes and contributed to the current variation in many genes. There is evidence that modern humans acquired immune variants through interbreeding with ancient hominins, but the impact of such variants on human traits is not fully understood. The main objectives of this research were to infer the genetic signatures of positive selection that may be involved in adaptation to infectious diseases and to investigate the function of Neanderthal alleles identified within a set of 50 Lithuanian genomes. Introgressed regions were identified using the machine learning tool ArchIE. Recent positive selection signatures were analysed using iHS. We detected high-scoring signals of positive selection at innate immunity genes (EMB, PARP8, HLAC, and CDSN) and evaluated their interactions with the structural proteins of pathogens. Interactions with human immunodeficiency virus (HIV) 1 and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were identified. Overall, genomic regions introgressed from Neanderthals were shown to be enriched in genes related to immunity, keratinocyte differentiation, and sensory perception.
Collapse
|
4
|
PopTradeOff: A database for exploring population-specificity of adaptive evolution, disease susceptibility, and drug responsiveness. Comput Struct Biotechnol J 2023; 21:3443-3451. [PMID: 37448726 PMCID: PMC10338148 DOI: 10.1016/j.csbj.2023.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/26/2023] [Accepted: 06/08/2023] [Indexed: 07/15/2023] Open
Abstract
The influence of adaptive evolution on disease susceptibility has drawn attention; however, the extent of the influence, whether favored mutations also influence drug responses, and whether the associations between the three are population-specific remain unknown. Using a reported deep learning network to integrate seven statistical tests for detecting selection signals, we predicted favored mutations in the genomes of 17 human populations and integrated these favored mutations with reported GWAS sites and drug response-related variants into the database PopTradeOff (http://www.gaemons.net/PopFMIntro). The database also contains genome annotation information on the SNP, sequence, gene, and pathway levels. The preliminary data analyses suggest that substantial associations exist between adaptive evolution, disease susceptibility, and drug responses and that the associations are highly population-specific. The database may be valuable for disease studies, drug development, and personalized medicine.
Collapse
|
5
|
Identifying Genomic Signatures of Positive Selection to Predict Protective Genomic Loci in the Cohort of Lithuanian Clean-Up Workers of the Chornobyl Nuclear Disaster. Curr Issues Mol Biol 2023; 45:2972-2983. [PMID: 37185719 PMCID: PMC10137185 DOI: 10.3390/cimb45040195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/29/2023] [Accepted: 03/31/2023] [Indexed: 04/07/2023] Open
Abstract
Some people resist or recover from health challenges better than others. We studied Lithuanian clean-up workers of the Chornobyl nuclear disaster (LCWC) who worked in the harshest conditions and, despite high ionising radiation doses as well as other factors, continue ageing relatively healthily. Thus, we hypothesised that there might be individual features encoded by the genome which act protectively for better adaptiveness and health that depend on unique positive selection signatures. Whole-genome sequencing was performed for 40 LCWC and a control group composed of 25 men from the general Lithuanian population (LTU). Selective sweep analysis was performed to identify genomic regions which may be under recent positive selection and determine better adaptiveness. Twenty-two autosomal loci with the highest positive selection signature values were identified. Most important, unique loci under positive selection have been identified in the genomes of the LCWC, which may influence the survival and adaptive qualities to extreme conditions, and the disaster itself. Characterising these loci provide a better understanding of the interaction between ongoing microevolutionary processes, multifactorial traits, and diseases. Studying unique groups of disease-resistant individuals could help create new insights for better, more individualised, disease diagnostics and prevention strategies.
Collapse
|
6
|
HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets. Mol Biol Evol 2023; 40:7040366. [PMID: 36790822 PMCID: PMC9985328 DOI: 10.1093/molbev/msad027] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 02/01/2023] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows-Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of "big data" genomics: a combinatorial core coupled with statistical inference in closed form.
Collapse
|
7
|
Identification of Key Genes and Pathways Associated with Preeclampsia by a WGCNA and an Evolutionary Approach. Genes (Basel) 2022; 13:genes13112134. [PMID: 36421809 PMCID: PMC9690438 DOI: 10.3390/genes13112134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/18/2022] Open
Abstract
Preeclampsia (PE) is the serious obstetric-related disease characterized by newly onset hypertension and causes damage to the kidneys, brain, liver, and more. To investigate genes with key roles in PE’s pathogenesis and their contributions, we used a microarray dataset of normotensive and PE patients and conducted a weighted gene co-expression network analysis (WGCNA). Cyan and magenta modules that are highly enriched with differentially expressed genes (DEGs) were revealed. By using the molecular complex detection (MCODE) algorithm, we identified five significant clusters in the cyan module protein–protein interaction (PPI) network and nine significant clusters in the magenta module PPI network. Our analyses indicated that (i) human accelerated region (HAR) genes are enriched in the magenta-associated C6 cluster, and (ii) positive selection (PS) genes are enriched in the cyan-associated C3 and C5 clusters. We propose these enriched HAR and PS genes, i.e., EIF4E, EIF5, EIF3M, DDX17, SRSF11, PSPC1, SUMO1, CAPZA1, PSMD14, and MNAT1, including highly connected hub genes, HNRNPA1, RBMX, PRKDC, and RANBP2, as candidate key genes for PE’s pathogenesis. A further clarification of the functions of these PPI clusters and key enriched genes will contribute to the discovery of diagnostic biomarkers for PE and therapeutic intervention targets.
Collapse
|
8
|
Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep 2022; 40:111351. [PMID: 36103812 DOI: 10.1016/j.celrep.2022.111351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/13/2022] [Accepted: 08/23/2022] [Indexed: 11/03/2022] Open
Abstract
Favored mutations in the human genome may make the carriers adapt to changing environments and lifestyles but also susceptible to specific diseases. The scale and details of the trade-off between adaptive evolution and disease susceptibility are unclear because most favored mutations in different populations remain unidentified. As no statistical test can discriminate favored mutations from nearby hitchhiking neutral ones, we report a deep-learning network (DeepFavored) to integrate multiple statistical tests and divide identifying favored mutations into two subtasks. We identify favored mutations in three human populations and analyzed the correlation between favored/hitchhiking mutations and genome-wide association study (GWAS) sites. Both favored and hitchhiking neutral mutations are enriched in GWAS sites with population-specific features, and the enrichment and population specificity are prominent in genes in specific Gene Ontology (GO) terms. These provide evidence for extensive and population-specific trade-offs between adaptive evolution and disease susceptibility. The unveiled scale helps understand and investigate differences and diseases of humans.
Collapse
|
9
|
Analysis of Common SNPs across Continents Reveals Major Genomic Differences between Human Populations. Genes (Basel) 2022; 13:genes13081472. [PMID: 36011383 PMCID: PMC9408407 DOI: 10.3390/genes13081472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/12/2022] [Accepted: 08/17/2022] [Indexed: 12/03/2022] Open
Abstract
Common alleles tend to be more ancient than rare alleles. These common SNPs appeared thousands of years ago and reflect intricate human evolution including various adaptations, admixtures, and migration events. Eighty-four thousand abundant region-specific alleles (ARSAs) that are common in one continent but absent in the rest of the world have been characterized by processing 3100 genomes from 230 populations. Also computed were 17,446 polymorphic sites with regional absence of common alleles (RACAs), which are widespread globally but absent in one region. A majority of these region-specific SNPs were found in Africa. America has the second greatest number of ARSAs (3348) and is even ahead of Europe (1911). Surprisingly, East Asia has the highest number of RACAs (10,524) and the lowest number of ARSAs (362). ARSAs and RACAs have distinct compositions of ancestral versus derived alleles in different geographical regions, reflecting their unique evolution. Genes associated with ARSA and RACA SNPs were identified and their functions were analyzed. The core 100 genes shared by multiple populations and associated with region-specific natural selection were examined. The largest part of them (42%) are related to the nervous system. ARSA and RACA SNPs are important for both association and human evolution studies.
Collapse
|
10
|
Admixture mapping of anthropometric traits in the Black Women's Health Study: evidence of a shared African ancestry component with birth weight and type 2 diabetes. J Hum Genet 2022; 67:331-338. [PMID: 35017682 DOI: 10.1038/s10038-022-01010-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 12/27/2021] [Accepted: 12/29/2021] [Indexed: 11/09/2022]
Abstract
Prevalence of obesity, type 2 diabetes (T2D), and being born with low birth weight are much higher in African American women compared to U.S. white women. Genetic factors may contribute to the excess risk of these conditions. We conducted admixture mapping of body mass index (BMI) at age 18, adult BMI, and adult waist circumference and waist-to-hip ratio adjusted for BMI using 2918 ancestral informative markers in 2596 participants of the Black Women's Health Study. We also searched for evidence of shared African genetic ancestry components among the four examined anthropometric traits and among birth weight and T2D. We found that global percent African ancestry was associated with higher adult BMI. We also found that African ancestry at 9q34 was associated with lower BMI at age 18. Our shared ancestry analysis identified ten genomic regions with local African ancestry associated with multiple traits. Seven out of these ten genomic loci were related to T2D risk. Of special interest is the 12q14-21 region where local African ancestry was associated with low birth weight, low BMI, high BMI-adjusted waist-to-hip ratio, and high T2D risk. Findings in the 12q14-21 genomic locus are consistent with the fetal insulin hypothesis that postulates that low birth weight and T2D have a common genetic basis, and they support the hypothesis of a shared African genetic ancestry component linking low birth weight and T2D in African Americans. Future studies should identify the actual genetic variants responsible for the clustering of these conditions in African Americans.
Collapse
|
11
|
PopHumanVar: an interactive application for the functional characterization and prioritization of adaptive genomic variants in humans. Nucleic Acids Res 2022; 50:D1069-D1076. [PMID: 34664660 PMCID: PMC8728255 DOI: 10.1093/nar/gkab925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/17/2021] [Accepted: 09/28/2021] [Indexed: 12/22/2022] Open
Abstract
Adaptive challenges that humans faced as they expanded across the globe left specific molecular footprints that can be decoded in our today's genomes. Different sets of metrics are used to identify genomic regions that have undergone selection. However, there are fewer methods capable of pinpointing the allele ultimately responsible for this selection. Here, we present PopHumanVar, an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions by integrating both functional and population genomics data currently available. PopHumanVar generates useful summary reports of prioritized variants that are putatively causal of recent selective sweeps. It compiles data and graphically represents different layers of information, including natural selection statistics, as well as functional annotations and genealogical estimations of variant age, for biallelic single nucleotide variants (SNVs) of the 1000 Genomes Project phase 3. Specifically, PopHumanVar amasses SNV-based information from GEVA, SnpEFF, GWAS Catalog, ClinVar, RegulomeDB and DisGeNET databases, as well as accurate estimations of iHS, nSL and iSAFE statistics. Notably, PopHumanVar can successfully identify known causal variants of frequently reported candidate selection regions, including EDAR in East-Asians, ACKR1 (DARC) in Africans and LCT/MCM6 in Europeans. PopHumanVar is open and freely available at https://pophumanvar.uab.cat.
Collapse
|
12
|
Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project. Genes (Basel) 2021; 13:genes13010044. [PMID: 35052384 PMCID: PMC8775088 DOI: 10.3390/genes13010044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 12/01/2022] Open
Abstract
The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.
Collapse
|
13
|
Hearing loss genes reveal patterns of adaptive evolution at the coding and non-coding levels in mammals. BMC Biol 2021; 19:244. [PMID: 34784928 PMCID: PMC8594068 DOI: 10.1186/s12915-021-01170-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 10/21/2021] [Indexed: 11/26/2022] Open
Abstract
Background Mammals possess unique hearing capacities that differ significantly from those of the rest of the amniotes. In order to gain insights into the evolution of the mammalian inner ear, we aim to identify the set of genetic changes and the evolutionary forces that underlie this process. We hypothesize that genes that impair hearing when mutated in humans or in mice (hearing loss (HL) genes) must play important roles in the development and physiology of the inner ear and may have been targets of selective forces across the evolution of mammals. Additionally, we investigated if these HL genes underwent a human-specific evolutionary process that could underlie the evolution of phenotypic traits that characterize human hearing. Results We compiled a dataset of HL genes including non-syndromic deafness genes identified by genetic screenings in humans and mice. We found that many genes including those required for the normal function of the inner ear such as LOXHD1, TMC1, OTOF, CDH23, and PCDH15 show strong signatures of positive selection. We also found numerous noncoding accelerated regions in HL genes, and among them, we identified active transcriptional enhancers through functional enhancer assays in transgenic zebrafish. Conclusions Our results indicate that the key inner ear genes and regulatory regions underwent adaptive evolution in the basal branch of mammals and along the human-specific branch, suggesting that they could have played an important role in the functional remodeling of the cochlea. Altogether, our data suggest that morphological and functional evolution could be attained through molecular changes affecting both coding and noncoding regulatory regions. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01170-6.
Collapse
|
14
|
Evolutionary forces in diabetes and hypertension pathogenesis in Africans. Hum Mol Genet 2021; 30:R110-R118. [PMID: 33734377 DOI: 10.1093/hmg/ddaa238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 10/16/2020] [Accepted: 10/22/2020] [Indexed: 11/12/2022] Open
Abstract
Rates of type 2 diabetes (T2D) and hypertension are increasing rapidly in urbanizing sub-Saharan Africa (SSA). While lifestyle factors drive the increases in T2D and hypertension prevalence, evidence across populations shows that genetic variation, which is driven by evolutionary forces including a natural selection that shaped the human genome, also plays a role. Here we report the evidence for the effect of selection in African genomes on mechanisms underlying T2D and hypertension, including energy metabolism, adipose tissue biology, insulin action and salt retention. Selection effects found for variants in genes PPARA and TCF7L2 may have enabled Africans to respond to nutritional challenges by altering carbohydrate and lipid metabolism. Likewise, African-ancestry-specific characteristics of adipose tissue biology (low visceral adipose tissue [VAT], high intermuscular adipose tissue and a strong association between VAT and adiponectin) may have been selected for in response to nutritional and infectious disease challenges in the African environment. Evidence for selection effects on insulin action, including insulin resistance and secretion, has been found for several genes including MPHOSPH9, TMEM127, ZRANB3 and MC3R. These effects may have been historically adaptive in critical conditions, such as famine and inflammation. A strong correlation between hypertension susceptibility variants and latitude supports the hypothesis of selection for salt retention mechanisms in warm, humid climates. Nevertheless, adaptive genomics studies in African populations are scarce. More work is needed, particularly genomics studies covering the wide diversity of African populations in SSA and Africans in diaspora, as well as further functional assessment of established risk loci.
Collapse
|
15
|
Abstract
Evolutionary processes, including mutation, migration and natural selection, have influenced the prevalence and distribution of various disorders in humans. However, despite a few well-known examples, such as the APOL1 variants - which have undergone positive genetic selection for their ability to confer resistance to Trypanosoma brucei infection but confer a higher risk of chronic kidney disease - little is known about the effects of evolutionary processes that have shaped genetic variation on kidney disease. An understanding of basic concepts in evolutionary genetics provides an opportunity to consider how findings from ancient and archaic genomes could inform our knowledge of evolution and provide insights into how population migration and genetic admixture have shaped the current distribution and landscape of human kidney-associated diseases. Differences in exposures to infectious agents, environmental toxins, dietary components and climate also have the potential to influence the evolutionary genetics of kidneys. Of note, selective pressure on loci associated with kidney disease is often from non-kidney diseases, and thus it is important to understand how the link between genome-wide selected loci and kidney disease occurs in relation to secondary nephropathies.
Collapse
|
16
|
Network and Evolutionary Analysis of Human Epigenetic Regulators to Unravel Disease Associations. Genes (Basel) 2020; 11:genes11121457. [PMID: 33291839 PMCID: PMC7761991 DOI: 10.3390/genes11121457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 11/29/2020] [Accepted: 11/30/2020] [Indexed: 12/15/2022] Open
Abstract
We carried out a system-level analysis of epigenetic regulators (ERs) and detailed the protein–protein interaction (PPI) network characteristics of disease-associated ERs. We found that most diseases associated with ERs can be clustered into two large groups, cancer diseases and developmental diseases. ER genes formed a highly interconnected PPI subnetwork, indicating a high tendency to interact and agglomerate with one another. We used the disease module detection (DIAMOnD) algorithm to expand the PPI subnetworks into a comprehensive cancer disease ER network (CDEN) and developmental disease ER network (DDEN). Using the transcriptome from early mouse developmental stages, we identified the gene co-expression modules significantly enriched for the CDEN and DDEN gene sets, which indicated the stage-dependent roles of ER-related disease genes during early embryonic development. The evolutionary rate and phylogenetic age distribution analysis indicated that the evolution of CDEN and DDEN genes was mostly constrained, and these genes exhibited older evolutionary age. Our analysis of human polymorphism data revealed that genes belonging to DDEN and Seed-DDEN were more likely to show signs of recent positive selection in human history. This finding suggests a potential association between positive selection of ERs and risk of developmental diseases through the mechanism of antagonistic pleiotropy.
Collapse
|
17
|
Identifying chromosomal subpopulations based on their recombination histories advances the study of the genetic basis of phenotypic traits. Genome Res 2020; 30:1802-1814. [PMID: 33203765 PMCID: PMC7706724 DOI: 10.1101/gr.258301.119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 10/22/2020] [Indexed: 02/06/2023]
Abstract
Recombination is a main source of genetic variability. However, the potential role of the variation generated by recombination in phenotypic traits, including diseases, remains unexplored because there is currently no method to infer chromosomal subpopulations based on recombination pattern differences. We developed recombClust, a method that uses SNP-phased data to detect differences in historic recombination in a chromosome population. We validated our method by performing simulations and by using real data to accurately predict the alleles of well-known recombination modifiers, including common inversions in Drosophila melanogaster and human, and the chromosomes under selective pressure at the lactase locus in humans. We then applied recombClust to the complex human 1q21.1 region, where nonallelic homologous recombination produces deleterious phenotypes. We discovered and validated the presence of two different recombination histories in these regions that significantly associated with the differential expression of ANKRD35 in whole blood and that were in high linkage with variants previously associated with hypertension. By detecting differences in historic recombination, our method opens a way to assess the influence of recombination variation in phenotypic traits.
Collapse
|
18
|
Genome (in)stability at tandem repeats. Semin Cell Dev Biol 2020; 113:97-112. [PMID: 33109442 DOI: 10.1016/j.semcdb.2020.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/26/2020] [Accepted: 10/10/2020] [Indexed: 12/12/2022]
Abstract
Repeat sequences account for over half of the human genome and represent a significant source of variation that underlies physiological and pathological states. Yet, their study has been hindered due to limitations in short-reads sequencing technology and difficulties in assembly. A important category of repetitive DNA in the human genome is comprised of tandem repeats (TRs), where repetitive units are arranged in a head-to-tail pattern. Compared to other regions of the genome, TRs carry between 10 and 10,000 fold higher mutation rate. There are several mutagenic mechanisms that can give rise to this propensity toward instability, but their precise contribution remains speculative. Given the high degree of homology between these sequences and their arrangement in tandem, once damaged, TRs have an intrinsic propensity to undergo aberrant recombination with non-allelic exchange and generate harmful rearrangements that may undermine the stability of the entire genome. The dynamic mutagenesis at TRs has been found to underlie individual polymorphism associated with neurodegenerative and neuromuscular disorders, as well as complex genetic diseases like cancer and diabetes. Here, we review our current understanding of the surveillance and repair mechanisms operating within these regions, and we describe how alterations in these protective processes can readily trigger mutational signatures found at TRs, ultimately resulting in the pathological correlation between TRs instability and human diseases. Finally, we provide a viewpoint to counter the detrimental effects that TRs pose in light of their selection and conservation, as important drivers of human evolution.
Collapse
|
19
|
Single-cell expression and Mendelian randomization analyses identify blood genes associated with lifespan and chronic diseases. Commun Biol 2020; 3:206. [PMID: 32358504 PMCID: PMC7195437 DOI: 10.1038/s42003-020-0937-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 04/10/2020] [Indexed: 12/13/2022] Open
Abstract
The human lifespan is a heritable trait, which is intricately linked to the development of disorders. Here, we show that genetic associations for the parental lifespan are enriched in open chromatin of blood cells. By using blood expression quantitative trait loci (eQTL) derived from 31,684 samples, we identified for the lifespan 125 cis- and 559 trans-regulated expressed genes (eGenes) enriched in adaptive and innate responses. Analysis of blood single-cell expression data showed that eGenes were enriched in dendritic cells (DCs) and the modelling of cell ligand-receptor interactions predicted crosstalk between DCs and a cluster of monocytes with a signature of cytotoxicity. In two-sample Mendelian randomization (MR), we identified 16 blood cis-eGenes causally associated with the lifespan. In MR, the majority of cis-eGene-disorder association pairs had concordant effects with the lifespan. The present work underlined that the lifespan is linked with the immune response and identifies eGenes associated with the lifespan and disorders.
Collapse
|
20
|
The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection. Nucleic Acids Res 2019; 47:D1-D7. [PMID: 30626175 PMCID: PMC6323895 DOI: 10.1093/nar/gky1267] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The 2019 Nucleic Acids Research (NAR) Database Issue contains 168 papers spanning molecular biology. Among them, 64 are new and another 92 are updates describing resources that appeared in the Issue previously. The remaining 12 are updates on databases most recently published elsewhere. This Issue contains two Breakthrough articles, on the Virtual Metabolic Human (VMH) database which links human and gut microbiota metabolism with diet and disease, and Vibrism DB, a database of mouse brain anatomy and gene (co-)expression with sophisticated visualization and session sharing. Major returning nucleic acid databases include RNAcentral, miRBase and LncRNA2Target. Protein sequence databases include UniProtKB, InterPro and Pfam, while wwPDB and RCSB cover protein structure. STRING and KEGG update in the section on metabolism and pathways. Microbial genomes are covered by IMG/M and resources for human and model organism genomics include Ensembl, UCSC Genome Browser, GENCODE and Flybase. Genomic variation and disease are well-covered by GWAS Catalog, PopHumanScan, OMIM and COSMIC, CADD being another major newcomer. Major new proteomics resources reporting here include iProX and jPOSTdb. The entire database issue is freely available online on the NAR website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 506 entries, adding 66 new resources and eliminating 147 discontinued URLs, bringing the current total to 1613 databases. It is available at http://www.oxfordjournals.org/nar/database/c.
Collapse
|