1
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
2
|
Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021; 21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.
Collapse
Affiliation(s)
- Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Alessandro Stella
- Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, UK
| |
Collapse
|
3
|
Hoh BP, Abdul Rahman T, Yusoff K. Natural selection and local adaptation of blood pressure regulation and their perspectives on precision medicine in hypertension. Hereditas 2019; 156:1. [PMID: 30636949 PMCID: PMC6323824 DOI: 10.1186/s41065-019-0080-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 01/01/2019] [Indexed: 01/09/2023] Open
Abstract
Prevalence of hypertension (HTN) varies substantially across different populations. HTN is not only common - affecting at least one third of the world's adult population - but is also the most important driver for cardiovascular diseases. Yet up to a third of hypertensive patients are resistant to therapy, contributed by secondary hypertension but more commonly the hitherto inability to precisely predict response to specific antihypertensive agents. Population and individual genomics information could be useful in guiding the selection and predicting the response to treatment - an approach known as precision medicine. However this cannot be achieved without the knowledge of genetic variations that influence blood pressure (BP). A number of evolutionary factors including population demographics and forces of natural selection may be involved. This article explores some ideas on how natural selection influences BP regulation in ethnically and geographically diverse populations that could lead to them being susceptible to HTN. We explore how such evolutionary factors could impact the implementation of precision medicine in HTN. Finally, in order to ensure the success of precision medicine in HTN, we call for more initiatives to understand the genetic architecture within and between diverse populations with ancestry from different parts of the world, and to precisely classify the intermediate phenotypes of HTN.
Collapse
Affiliation(s)
- Boon-Peng Hoh
- 1Faculty of Medicine and Health Sciences, UCSI University, Cheras, 56000 Kuala Lumpur, Malaysia.,2Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031 China
| | - Thuhairah Abdul Rahman
- 3Clinical Pathology Diagnostic Centre Research Laboratory, Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, 47000 Sungai Buloh, Selangor Malaysia
| | - Khalid Yusoff
- 1Faculty of Medicine and Health Sciences, UCSI University, Cheras, 56000 Kuala Lumpur, Malaysia
| |
Collapse
|
4
|
Sun C, Huo D, Southard C, Nemesure B, Hennis A, Cristina Leske M, Wu SY, Witonsky DB, Olopade OI, Di Rienzo A. A signature of balancing selection in the region upstream to the human UGT2B4 gene and implications for breast cancer risk. Hum Genet 2011; 130:767-75. [PMID: 21660508 PMCID: PMC4478588 DOI: 10.1007/s00439-011-1025-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 05/28/2011] [Indexed: 10/18/2022]
Abstract
UDP-glucuronosyltransferase 2 family, polypeptide B4 (UGT2B4) is an important metabolizing enzyme involved in the clearance of many xenobiotics and endogenous substrates, especially steroid hormones and bile acids. The HapMap data show that numerous SNPs upstream of UGT2B4 are in near-perfect linkage disequilibrium with each other and occur at intermediate frequency, indicating that this region might contain a target of natural selection. To investigate this possibility, we chose three regions (4.8 kb in total) for resequencing and observed a striking excess of intermediate-frequency alleles that define two major haplotypes separated by many mutation events and with little differentiation across populations, thus suggesting that the variation pattern upstream UGT2B4 is highly unusual and may be the result of balancing selection. We propose that this pattern is due to the maintenance of a regulatory polymorphism involved in the fine tuning of UGT2B4 expression so that heterozygous genotypes result in optimal enzyme levels. Considering the important role of steroid hormones in breast cancer susceptibility, we hypothesized that variation in this region could predispose to breast cancer. To test this hypothesis, we genotyped tag SNP rs13129471 in 1,261 patients and 825 normal women of African ancestry from three populations. The frequency comparison indicated that rs13129471 was significantly associated with breast cancer after adjusting for ethnicity [P = 0.003; heterozygous odds ratio (OR) 1.02, 95% confidence interval (CI) 0.81-1.28; homozygous OR 1.50, 95% CI 1.15-1.95]. Our results provide new insights into UGT2B4 sequence variation and indicate that a signal of natural selection may lead to the identification of disease susceptibility variants.
Collapse
Affiliation(s)
- Chang Sun
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Haplotype variation in the ACE gene in global populations, with special reference to India, and an alternative model of evolution of haplotypes. THE HUGO JOURNAL 2011. [PMID: 23205163 DOI: 10.1007/s11568-011-9153-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
UNLABELLED Angiotensin-I-converting enzyme (ACE) is known to be associated with human cardiovascular and psychiatric pathophysiology. We have undertaken a global survey of the haplotypes in ACE gene to study diversity and to draw inferences on the nature of selective forces that may be operating on this gene. We have investigated the haplotype profiles reconstructed using polymorphisms in the regulatory (rs4277405, rs4459609, rs1800764, rs4292, rs4291), exonic (rs4309, rs4331, rs4343), and intronic (rs4340; Alu [I/D]) regions covering 17.8 kb of the ACE gene. We genotyped these polymorphisms in a large number of individuals drawn from 15 Indian ethnic groups and estimated haplotype frequencies. We compared the Indian data with available data from other global populations. Globally, five major haplotypes were observed. High-frequency haplotypes comprising mismatching alleles at the loci considered were seen in all populations. The three most frequent haplotypes among Africans were distinct from the major haplotypes of other world populations. We have studied the evolution of the two major haplotypes (TATATTGIA and CCCTCCADG), one of which contains an Alu insertion (I) and the other a deletion (D), seen most frequently among Caucasians (68%), non-African HapMap populations (65-88%), and Indian populations (70-95%) in detail. The two major haplotypes among Caucasians are reported to represent two distinct clades A and B. Earlier studies have postulated that a third clade C (represented by the haplotypes TACATCADG and TACATCADA) arose from an ancestral recombination event between A and B. We find that a more parsimonious explanation is that clades A and B have arisen by recombination between haplotypes belonging to clade C and a high-frequency African haplotype CCCTTCGIA. The haplotypes, which according to our hypothesis are the putative non-recombinants (PuNR), are uncommon in all non-African populations (frequency range 0-12%). Conversely, the frequencies of the putative recombinant haplotypes (PuR) are very low in the Africans populations (2-8%), indicating that the recombination event is likely to be ancient and arose before, perhaps shortly prior to, the global dispersal of modern humans. The global frequency spectrum of the PuR and the PuNR is difficult to explain only by drift. It appears likely that the ACE gene has been undergoing a combination of different selective pressures. ELECTRONIC SUPPLEMENTARY MATERIAL The online version of this article (doi:10.1007/s11568-011-9153-6) contains supplementary material, which is available to authorized users.
Collapse
|
6
|
Fiuza-Luces C, Ruiz JR, Rodríguez-Romo G, Santiago C, Gómez-Gallego F, Cano-Nieto A, Garatachea N, Rodríguez-Moreno I, Morán M, Lucia A. Is the ACE I/D polymorphism associated with extreme longevity? A study on a Spanish cohort. J Renin Angiotensin Aldosterone Syst 2011; 12:202-7. [DOI: 10.1177/1470320310391505] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The 287 bp Ins(I)/Del(D) polymorphism [rs1799752] in intron 16 of the angiotensin-converting enzyme ( ACE) gene has been associated with extreme longevity (≥ 100 years) in some Caucasian and Asian cohorts, but this finding was not corroborated in other reports. We compared the allelic/genotypic frequency of the ACE I/D polymorphism among centenarians ( N = 64, 100—108 years, 89.1% female) and nonagenarians ( N = 47, 90—97 years, 76.6% female), and a control group of healthy young adults ( n = 434, age 20—40 years, 50% female). All participants were of the same Caucasian (Spanish) descent. The ACE I/D genotype met Hardy—Weinberg expectations in all the cohorts. Allelic and genotypic frequencies did not differ by sex in any of the study groups (all p > 0.2). There were no differences in allelic or genotypic frequencies between groups, for example the frequency of the D allele was 62.3% in controls vs. 65.3% in the elderly (64.8% in centenarians). In summary, the ACE I/D polymorphism is not significantly associated with extreme longevity in the Spanish population. Further research is, however, necessary using other approaches. It also remains to be determined if the interaction of ACE genotypes with some other genetic variants exerts a potential effect on longevity.
Collapse
Affiliation(s)
| | - Jonatan R Ruiz
- Department of Biosciences and Nutrition at NOVUM, Unit for Preventive Nutrition, Karolinska Institutet, Stockholm, Sweden
| | | | | | | | | | - Nuria Garatachea
- Faculty of Health and Sport Science, University of Zaragoza, Huesca, Spain
| | | | - María Morán
- Centro de Investigación Hospital 12 de Octubre and CIBERER, Madrid, Spain
| | | |
Collapse
|