1
|
Vahedi SM, Ardestani SS. FSTest: an efficient tool for cross-population fixation index estimation on variant call format files. J Genet 2024; 103:04. [PMID: 38258299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Bible Hill, NS B2N5E3,
| | | |
Collapse
|
2
|
Kumar L, Farias K, Prakash S, Mishra A, Mustak MS, Rai N, Thangaraj K. Dissecting the genetic history of the Roman Catholic populations of West Coast India. Hum Genet 2021; 140:1487-1498. [PMID: 34424406 DOI: 10.1007/s00439-021-02346-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 08/12/2021] [Indexed: 10/20/2022]
Abstract
Migration and admixture history of populations have always been curious and an interesting theme. The West Coast of India harbours a rich diversity, bestowing various ethno-linguistic groups, with many of them having well-documented history of migrations. The Roman Catholic is one such distinct group, whose origin was much debated. While some historians and anthropologists relating them to ancient group of Gaud Saraswat Brahmins, others relating them for being members of the Jews Lost Tribes in the first Century migration to India. Historical records suggests that this community was later forcibly converted to Christianity by the Portuguese in Goa during the Sixteenth Century. Till date, no genetic study was done on this group to infer their origin and genetic affinity. Hence, we analysed 110 Roman Catholics from three different locations of West Coast of India including Goa, Kumta and Mangalore using both uniparental and autosomal markers to understand their genetic history. We found that the Roman Catholics have close affinity with the Indo-European linguistic groups, particularly Brahmins. Additionally, we detected genetic signal of Jews in the linkage disequilibrium-based admixture analysis, which was absent in other Indo-European populations, who are inhabited in the same geographical regions. Haplotype-based analysis suggests that the Roman Catholics consist of South Asian-specific ancestry and showed high drift. Ancestry-specific historical population size estimation points to a possible bottleneck around the time of Goan inquisition (fifteenth century). Analysis of the Roman Catholics data along with ancient DNA data of Neolithic and bronze age revealed that the Roman Catholics fits well in a basic model of ancient ancestral composition, typical of most of the Indo-European caste groups of India. Mitochondrial DNA (mtDNA) analysis suggests that most of the Roman Catholics have aboriginal Indian maternal genetic ancestry; while the Y chromosomal DNA analysis indicates high frequency of R1a lineage, which is predominant in groups with higher ancestral North Indian (ANI) component. Therefore, we conclude that the Roman Catholics of Goa, Kumta and Mangalore regions are the remnants of very early lineages of Brahmin community of India, having Indo-Europeans genetic affinity along with cryptic Jewish admixture, which needs to be explored further.
Collapse
Affiliation(s)
- Lomous Kumar
- CSIR-Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, Telangana, 500007, India
| | - Kranti Farias
- Canadian Institute for Jewish Research, Montreal, Canada
| | - Satya Prakash
- CSIR-Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, Telangana, 500007, India
| | - Anshuman Mishra
- Institute of Advanced Materials, IAAM, Gammalkilsvägen 18, 590 53, Ulrika, Sweden
| | - Mohammed S Mustak
- Department of Applied Zoology, Mangalore University, Mangalore, 574199, India
| | - Niraj Rai
- Birbal Sahni Institute of Palaeosciences, Uttar Pradesh, 53 University Road, Lucknow, 226007, India.
| | - Kumarasamy Thangaraj
- CSIR-Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, Telangana, 500007, India.
- DBT-Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, 500007, India.
| |
Collapse
|
3
|
Skead K, Ang Houle A, Abelson S, Agbessi M, Bruat V, Lin B, Soave D, Shlush L, Wright S, Dick J, Morris Q, Awadalla P. Interacting evolutionary pressures drive mutation dynamics and health outcomes in aging blood. Nat Commun 2021; 12:4921. [PMID: 34389724 PMCID: PMC8363714 DOI: 10.1038/s41467-021-25172-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 07/27/2021] [Indexed: 01/10/2023] Open
Abstract
Age-related clonal hematopoiesis (ARCH) is characterized by age-associated accumulation of somatic mutations in hematopoietic stem cells (HSCs) or their pluripotent descendants. HSCs harboring driver mutations will be positively selected and cells carrying these mutations will rise in frequency. While ARCH is a known risk factor for blood malignancies, such as Acute Myeloid Leukemia (AML), why some people who harbor ARCH driver mutations do not progress to AML remains unclear. Here, we model the interaction of positive and negative selection in deeply sequenced blood samples from individuals who subsequently progressed to AML, compared to healthy controls, using deep learning and population genetics. Our modeling allows us to discriminate amongst evolutionary classes with high accuracy and captures signatures of purifying selection in most individuals. Purifying selection, acting on benign or mildly damaging passenger mutations, appears to play a critical role in preventing disease-predisposing clones from rising to dominance and is associated with longer disease-free survival. Through exploring a range of evolutionary models, we show how different classes of selection shape clonal dynamics and health outcomes thus enabling us to better identify individuals at a high risk of malignancy.
Collapse
Affiliation(s)
- Kimberly Skead
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Armande Ang Houle
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Sagi Abelson
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | | | - Vanessa Bruat
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Boxi Lin
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - David Soave
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Mathematics, Wilfrid Laurier University, Waterloo, ON, Canada
| | - Liran Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Stephen Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - John Dick
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, United States.
| | - Philip Awadalla
- Ontario Institute for Cancer Research, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Marin WM, Dandekar R, Augusto DG, Yusufali T, Heyn B, Hofmann J, Lange V, Sauter J, Norman PJ, Hollenbach JA. High-throughput Interpretation of Killer-cell Immunoglobulin-like Receptor Short-read Sequencing Data with PING. PLoS Comput Biol 2021; 17:e1008904. [PMID: 34339413 PMCID: PMC8360517 DOI: 10.1371/journal.pcbi.1008904] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 08/12/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.
Collapse
Affiliation(s)
- Wesley M. Marin
- UCSF Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
| | - Ravi Dandekar
- UCSF Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
| | - Danillo G. Augusto
- UCSF Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
| | - Tasneem Yusufali
- UCSF Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
| | | | | | | | | | - Paul J. Norman
- Division of Biomedical Informatics and Personalized Medicine, and Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Jill A. Hollenbach
- UCSF Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
| |
Collapse
|
5
|
Leonenko G, Baker E, Stevenson-Hoare J, Sierksma A, Fiers M, Williams J, de Strooper B, Escott-Price V. Identifying individuals with high risk of Alzheimer's disease using polygenic risk scores. Nat Commun 2021; 12:4506. [PMID: 34301930 PMCID: PMC8302739 DOI: 10.1038/s41467-021-24082-z] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 06/02/2021] [Indexed: 11/09/2022] Open
Abstract
Polygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of APOE, what the optimal p-value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (APOE and PRS excluding APOE region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals' scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals' scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.
Collapse
Affiliation(s)
- Ganna Leonenko
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | - Emily Baker
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | | | - Annerieke Sierksma
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
| | - Mark Fiers
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
- UK Dementia Research Institute, University College London, London, UK
| | - Julie Williams
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Bart de Strooper
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
- UK Dementia Research Institute, University College London, London, UK
| | - Valentina Escott-Price
- UK Dementia Research Institute, Cardiff University, Cardiff, UK.
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| |
Collapse
|
6
|
Abstract
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Collapse
Affiliation(s)
- William S DeWitt
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Kameron Decker Harris
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195
- Department of Biology, University of Washington, Seattle, WA 98195
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity, Unit of Advanced Genomics, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Mexico 36821
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
7
|
Abstract
FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates. Kinship coefficients and FST, which measure relatedness and population structure, respectively, are important quantities needed to accurately perform various analyses on genetic data, including genome-wide association studies and heritability estimation. However, existing estimators require restrictive assumptions of independence that are not met by real human and other datasets. In this work we find that existing estimators can be severely biased under reasonable scenarios, first by theoretically determining their properties, and then using an admixture simulation to illustrate our findings. In particular, we find that existing FST estimators are downwardly biased, and that existing kinship matrix estimators have related biases that are on average downward and of similar magnitude but vary for every pair of individuals. These insights led us to a new estimation framework for kinship and FST that is practically unbiased for any population structure, as demonstrated by theory and simulations. Our new approaches—available as open-source R packages—are easy to use and are more widely applicable than existing approaches, and they are likely to improve downstream analyses that require accurate kinship and FST estimates.
Collapse
Affiliation(s)
- Alejandro Ochoa
- Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - John D. Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
8
|
Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, Jolly B, Batra A, Sharma S, Siwach S, Jadhao AG, Palande N, Jha GN, Ashrafi N, Mishra PK, A. K. V, Jain S, Dash D, Kumar NS, Vanlallawma A, Sarma R, Chhakchhuak L, Kalyanaraman S, Mahadevan R, Kandasamy S, B. M. P, Rajagopal RE, J. ER, P. ND, Bajaj A, Gupta V, Mathew S, Goswami S, Mangla M, Prakash S, Joshi K, S. S, Gajjar D, Soraisham R, Yadav R, Devi YS, Gupta A, Mukerji M, Ramalingam S, B. K. B, Scaria V, Sivasubbu S. IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic Acids Res 2021; 49:D1225-D1232. [PMID: 33095885 PMCID: PMC7778947 DOI: 10.1093/nar/gkaa923] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/01/2020] [Accepted: 10/22/2020] [Indexed: 12/15/2022] Open
Abstract
With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' http://clingen.igib.res.in/indigen/. The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.
Collapse
Affiliation(s)
- Abhinav Jain
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Rahul C Bhoyar
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Kavita Pandhare
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Anushree Mishra
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Disha Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Mohamed Imran
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vigneshwar Senthivel
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mohit Kumar Divakar
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mercy Rophina
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Bani Jolly
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Arushi Batra
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sumit Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Sanjay Siwach
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Arun G Jadhao
- Department of Zoology, RTM Nagpur University, Nagpur, Maharashtra 440033, India
| | - Nikhil V Palande
- Department of Zoology, Shri Mathuradas Mohota College of Science, Nagpur, Maharashtra 440009, India
| | - Ganga Nath Jha
- Department of Anthropology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Nishat Ashrafi
- Department of Anthropology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Prashant Kumar Mishra
- Department of Biotechnology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Vidhya A. K.
- Department of Biochemistry, Dr. Kongu Science and Art College, Erode, Tamil Nadu 638107, India
| | - Suman Jain
- Thalassemia and Sickle cell Society, Hyderabad, Telangana 500052, India
| | - Debasis Dash
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | | | - Andrew Vanlallawma
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram 796004, India
| | - Ranjan Jyoti Sarma
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram 796004, India
| | | | | | - Radha Mahadevan
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Sunitha Kandasamy
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Pabitha B. M.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | | | - Ezhil Ramya J.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Nirmala Devi P.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Anjali Bajaj
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vishu Gupta
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Samatha Mathew
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sangam Goswami
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mohit Mangla
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Savinitha Prakash
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Kandarp Joshi
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Sreedevi S.
- Department of Microbiology, St.Pious X Degree & PG College for Women, Hyderabad, Telangana 500076, India
| | - Devarshi Gajjar
- Department of Microbiology, The Maharaja Sayajirao University of Baroda, Vadodara, Gujarat 390002, India
| | - Ronibala Soraisham
- Department of Dermatology, Venereology and Leprology, Regional Institute of Medical Sciences, Imphal, Manipur 795004, India
| | - Rohit Yadav
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Yumnam Silla Devi
- CSIR- North East Institute of Science and Technology, Jorhat, Assam 785006, India
| | - Aayush Gupta
- Department of Dermatology, Dr. D.Y. Patil Medical College, Pune, Maharashtra 411018, India
| | - Mitali Mukerji
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sivaprakash Ramalingam
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Binukumar B. K.
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vinod Scaria
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sridhar Sivasubbu
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| |
Collapse
|
9
|
Alfraih F, Alawwami M, Aljurf M, Alhumaidan H, Alsaedi H, El Fakih R, Alotaibi B, Rasheed W, Bernas SN, Massalski C, Heidl A, Sauter J, Lange V, Schmidt AH. High-resolution HLA allele and haplotype frequencies of the Saudi Arabian population based on 45,457 individuals and corresponding stem cell donor matching probabilities. Hum Immunol 2020; 82:97-102. [PMID: 33388178 DOI: 10.1016/j.humimm.2020.12.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 12/09/2020] [Accepted: 12/10/2020] [Indexed: 11/20/2022]
Abstract
We estimated HLA allele and haplotype frequencies of the Saudi Arabian population from a sample of 45,457 registered stem cell donors. The most frequent HLA alleles were A*02:01g (18.5%), C*06:02g (16.1%), B*51:01g (14.1%), DRB1*07:01g (16.2%), DQB1*02:01g (30.5%), and DPB1*04:01g (33.6%). The most frequent 5-locus haplotypes were A*02:05g~C*06:02g~B*50:01g~DRB1*07:01g~DQB1*02:01g (1.73%), A*02:01g~C*06:02g~B*50:01g~DRB1*07:01g~DQB1*02:01g (1.66%), and A*26:01g~C*07:02g~B*08:01g~DRB1*03:01g~DQB1*02:01g (1.38%). Furthermore, we used the calculated haplotype frequencies to estimate stem cell donor matching probabilities for Saudi Arabian donor and patient populations under various matching requirements. These results are relevant for strategic donor registry planning in the Kingdom of Saudi Arabia.
Collapse
Affiliation(s)
- Feras Alfraih
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Moheeb Alawwami
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Mahmoud Aljurf
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Hind Alhumaidan
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Hawazen Alsaedi
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Riad El Fakih
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Bander Alotaibi
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Walid Rasheed
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
10
|
Llanes A, Ortiz L, Moscoso J, Gutiérrez G, Blake E, Restrepo CM, Lleonart R, Cuero C, Vernaza-Kwiers A. HLA allele and haplotype frequencies in the Panamanian population. Hum Immunol 2020; 82:5-7. [PMID: 33303214 DOI: 10.1016/j.humimm.2020.11.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 11/20/2020] [Accepted: 11/25/2020] [Indexed: 11/18/2022]
Abstract
In this study, we report for the first time HLA allele and haplotype frequencies in the modern Panamanian population at a two-field (four digits) resolution level. Reported frequencies were calculated from genotype data for the HLA-A, -B, -C, -DPB1, -DQB1 and -DRB1 loci of 462 healthy unrelated Panamanian adults of Hispanic ethnicity. In addition to providing new insights on the allelic structure of the Panamanian population and its origin, these data are critical for better planning of healthcare strategies in the country and for future research exploring the association with certain chronic and infectious diseases.
Collapse
Affiliation(s)
- Alejandro Llanes
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Ciudad del Saber, Clayton, Panama, Panama
| | - Luis Ortiz
- Laboratorio Nacional de Trasplante, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama
| | - Juan Moscoso
- Laboratorio Nacional de Trasplante, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama
| | - Gina Gutiérrez
- Laboratorio Nacional de Trasplante, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama
| | - Elena Blake
- Laboratorio Nacional de Trasplante, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama
| | - Carlos M Restrepo
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Ciudad del Saber, Clayton, Panama, Panama
| | - Ricardo Lleonart
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Ciudad del Saber, Clayton, Panama, Panama
| | - Cesar Cuero
- Servicio de Nefrología, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama
| | - Alejandro Vernaza-Kwiers
- Laboratorio Nacional de Trasplante, Complejo Hospitalario Dr. Arnulfo Arias Madrid (CHDrAAM), Caja de Seguro Social, Panama, Panama.
| |
Collapse
|
11
|
Vianna R, Hanhoerderster L, Cardoso J, Cristóvão Porto L. NGS-based typings for 7 HLA loci in three populations from Rio de Janeiro, RJ, Brazil. Hum Immunol 2020; 82:3-4. [PMID: 33267971 DOI: 10.1016/j.humimm.2020.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 11/17/2020] [Accepted: 11/18/2020] [Indexed: 11/19/2022]
Abstract
We investigated HLA class I (HLA-A, -B, and -C) and class II (HLA-DRB1, -DQB1, -DPA1, and -DPB1) alleles by NGS-based typing among 759 Brazilian individuals from three populations in the Rio de Janeiro city based on their self-declared skin color (Caucasian, N = 521, AFND-ID: 3730; Parda, N = 170, AFND-ID: 3728; Black, N = 68, AFND-ID: 3727) to calculate allelic and haplotypic frequencies, plus linkage disequilibrium. Only HLA-DRB1 locus deviated from Hardy-Weinberg equilibrium (in Caucasian and Black populations). The three populations shared the most frequent allele on HLA-A, -C, -DRB1, -DPA1, and -DPB1. Genotype and frequency data are available in the Allele Frequencies Net Database.
Collapse
Affiliation(s)
- Romulo Vianna
- Laboratório de Histocompatibilidade e Criopreservação, Universidade do Estado do Rio de Janeiro, Avenida Marechal Rondon 381, 20950-003 Rio de Janeiro, Brazil.
| | - Leonardo Hanhoerderster
- Laboratório de Histocompatibilidade e Criopreservação, Universidade do Estado do Rio de Janeiro, Avenida Marechal Rondon 381, 20950-003 Rio de Janeiro, Brazil
| | - Juliana Cardoso
- Laboratório de Histocompatibilidade e Criopreservação, Universidade do Estado do Rio de Janeiro, Avenida Marechal Rondon 381, 20950-003 Rio de Janeiro, Brazil
| | - Luís Cristóvão Porto
- Laboratório de Histocompatibilidade e Criopreservação, Universidade do Estado do Rio de Janeiro, Avenida Marechal Rondon 381, 20950-003 Rio de Janeiro, Brazil
| |
Collapse
|
12
|
Wang SQ. Genetic diversity and population structure of the endangered species Paeonia decomposita endemic to China and implications for its conservation. BMC Plant Biol 2020; 20:510. [PMID: 33167894 PMCID: PMC7650209 DOI: 10.1186/s12870-020-02682-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 10/01/2020] [Indexed: 05/19/2023]
Abstract
BACKGROUND Paeonia decomposita, endemic to China, has important ornamental, medicinal, and economic value and is regarded as an endangered plant. The genetic diversity and population structure have seldom been described. A conservation management plan is not currently available. RESULTS In the present study, 16 pairs of simple sequence repeat (SSR) primers were used to evaluate the genetic diversity and population structure. A total of 122 alleles were obtained with a mean of 7.625 alleles per locus. The expected heterozygosity (He) varied from 0.043 to 0.901 (mean 0.492) in 16 primers. Moderate genetic diversity (He = 0.405) among populations was revealed, with Danba identified as the center of genetic diversity. Mantel tests revealed a positive correlation between geographic and genetic distance among populations (r = 0.592, P = 0.0001), demonstrating consistency with the isolation by distance model. Analysis of molecular variance (AMOVA) indicated that the principal molecular variance existed within populations (73.48%) rather than among populations (26.52%). Bayesian structure analysis and principal coordinate analysis (PCoA) supported the classification of the populations into three clusters. CONCLUSIONS This is the first study of the genetic diversity and population structure of P. decomposita using SSR. Three management units were proposed as conservation measures. The results will be beneficial for the conservation and exploitation of the species, providing a theoretical basis for further research of its evolution and phylogeography.
Collapse
Affiliation(s)
- Shi-Quan Wang
- Ministry of Education Key Laboratory for Ecology of Tropical Islands, Key Laboratory of Tropical Animal and Plant Ecology of Hainan Province, College of Life Sciences, Hainan Normal University, Haikou, 571158, China.
| |
Collapse
|
13
|
Abstract
Resources are rarely distributed uniformly within a population. Heterogeneity in the concentration of a drug, the quality of breeding sites, or wealth can all affect evolutionary dynamics. In this study, we represent a collection of properties affecting the fitness at a given location using a color. A green node is rich in resources while a red node is poorer. More colors can represent a broader spectrum of resource qualities. For a population evolving according to the birth-death Moran model, the first question we address is which structures, identified by graph connectivity and graph coloring, are evolutionarily equivalent. We prove that all properly two-colored, undirected, regular graphs are evolutionarily equivalent (where “properly colored” means that no two neighbors have the same color). We then compare the effects of background heterogeneity on properly two-colored graphs to those with alternative schemes in which the colors are permuted. Finally, we discuss dynamic coloring as a model for spatiotemporal resource fluctuations, and we illustrate that random dynamic colorings often diminish the effects of background heterogeneity relative to a proper two-coloring. Heterogeneity in environmental conditions can have profound effects on long-term evolutionary outcomes in structured populations. We consider a population evolving on a colored graph, wherein the color of a node represents the resources at that location. Using a combination of analytical and numerical methods, we quantify the effects of background heterogeneity on a population’s dynamics. In addition to considering the notion of an “optimal” coloring with respect to mutant invasion, we also study the effects of dynamic spatial redistribution of resources as the population evolves. Although the effects of static background heterogeneity can be quite striking, these effects are often attenuated by the movement (or “flow”) of the underlying resources.
Collapse
Affiliation(s)
- Kamran Kaveh
- Department of Mathematics, Dartmouth College, Hanover, New Hampshire, United States
- * E-mail: (KK); (AM)
| | - Alex McAvoy
- Department of Mathematics, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- * E-mail: (KK); (AM)
| | | | - Martin A. Nowak
- Department of Mathematics, Harvard University, Cambridge, Massachusetts, United States
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States
| |
Collapse
|
14
|
Watson JA, Taylor AR, Ashley EA, Dondorp A, Buckee CO, White NJ, Holmes CC. A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices. PLoS Genet 2020; 16:e1009037. [PMID: 33035220 PMCID: PMC7577480 DOI: 10.1371/journal.pgen.1009037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 10/21/2020] [Accepted: 08/08/2020] [Indexed: 11/20/2022] Open
Abstract
Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.
Collapse
Affiliation(s)
- James A. Watson
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Aimee R. Taylor
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Elizabeth A. Ashley
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- Lao-Oxford-Mahosot Hospital Wellcome Trust Research Unit, Vientiane, Laos
| | - Arjen Dondorp
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Caroline O. Buckee
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Nicholas J. White
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Chris C. Holmes
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
15
|
Abstract
The recent years have seen a growing number of studies investigating evolutionary questions using ancient DNA. To address these questions, one of the most frequently-used method is principal component analysis (PCA). When PCA is applied to temporal samples, the sample dates are, however, ignored during analysis, leading to imperfect representations of samples in PC plots. Here, we present a factor analysis (FA) method in which individual scores are corrected for the effect of allele frequency drift over time. We obtained exact solutions for the estimates of corrected factors, and we provided a fast algorithm for their computation. Using computer simulations and ancient European samples, we compared geometric representations obtained from FA with PCA and with ancestry estimation programs. In admixture analyses, FA estimates agreed with tree-based statistics, and they were more accurate than those obtained from PCA projections and from ancestry estimation programs. A great advantage of FA over existing approaches is to improve descriptive analyses of ancient DNA samples without requiring inclusion of outgroup or present-day samples.
Collapse
Affiliation(s)
- Olivier François
- Université Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble INP, Laboratoire TIMC-IMAG UMR 5525, 38000, Grenoble, France.
| | - Flora Jay
- Université Paris-Saclay, Centre National de la Recherche Scientifique, Inria, Laboratoire de Recherche en Informatique UMR 8623, Bâtiment 650 Ada Lovelace, 91405, Orsay Cedex, France.
| |
Collapse
|
16
|
Hejase HA, Dukler N, Siepel A. From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection. Trends Genet 2020; 36:243-258. [PMID: 31954511 PMCID: PMC7177178 DOI: 10.1016/j.tig.2019.12.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 11/15/2019] [Accepted: 12/11/2019] [Indexed: 01/01/2023]
Abstract
Methods to detect signals of natural selection from genomic data have traditionally emphasized the use of simple summary statistics. Here, we review a new generation of methods that consider combinations of conventional summary statistics and/or richer features derived from inferred gene trees and ancestral recombination graphs (ARGs). We also review recent advances in methods for population genetic simulation and ARG reconstruction. Finally, we describe opportunities for future work on a variety of related topics, including the genetics of speciation, estimation of selection coefficients, and inference of selection on polygenic traits. Together, these emerging methods offer promising new directions in the study of natural selection.
Collapse
Affiliation(s)
- Hussein A Hejase
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
17
|
Charlesworth B. In defence of doing sums in genetics. Heredity (Edinb) 2019; 123:44-49. [PMID: 31189907 PMCID: PMC6781122 DOI: 10.1038/s41437-019-0195-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 11/08/2022] Open
Abstract
There has been a long history of the use of mathematics in genetics, ranging from the use of statistics to analyse genetic data to genetic models of evolutionary processes. Contemporary research into the genomic basis of disease and complex traits exemplifies the importance of statistical methods in genetics. Some examples of the development and application of population genetic models are described, which are intended to highlight the utility of such models for understanding variation and evolution in natural populations. The effects of selection on variability at sites linked to the targets of selection illustrate how fruitful interactions between theory and data can be.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, Edinburgh, EH8 9FL, UK.
| |
Collapse
|
18
|
Parolin ML, Toscanini UF, Velázquez IF, Llull C, Berardi GL, Holley A, Tamburrini C, Avena S, Carnese FR, Lanata JL, Sánchez Carnero N, Arce LF, Basso NG, Pereira R, Gusmão L. Genetic admixture patterns in Argentinian Patagonia. PLoS One 2019; 14:e0214830. [PMID: 31206551 PMCID: PMC6576754 DOI: 10.1371/journal.pone.0214830] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 06/03/2019] [Indexed: 12/21/2022] Open
Abstract
As in other Latin American populations, Argentinians are the result of the admixture amongst different continental groups, mainly from America and Europe, and to a lesser extent from Sub-Saharan Africa. However, it is known that the admixture processes did not occur homogeneously throughout the country. Therefore, considering the importance for anthropological, medical and forensic researches, this study aimed to investigate the population genetic structure of the Argentinian Patagonia, through the analysis of 46 ancestry informative markers, in 433 individuals from five different localities. Overall, in the Patagonian sample, the average individual ancestry was estimated as 35.8% Native American (95% CI: 32.2–39.4%), 62.1% European (58.5–65.7%) and 2.1% African (1.7–2.4%). Comparing the five localities studied, statistically significant differences were observed for the Native American and European contributions, but not for the African ancestry. The admixture results combined with the genealogical information revealed intra-regional variations that are consistent with the different geographic origin of the participants and their ancestors. As expected, a high European ancestry was observed for donors with four grandparents born in Europe (96.8%) or in the Central region of Argentina (85%). In contrast, the Native American ancestry increased when the four grandparents were born in the North (71%) or in the South (61.9%) regions of the country, or even in Chile (60.5%). In summary, our results showed that differences on continental ancestry contribution have different origins in each region in Patagonia, and even in each locality, highlighting the importance of knowing the origin of the participants and their ancestors for the correct interpretation and contextualization of the genetic information.
Collapse
Affiliation(s)
- María Laura Parolin
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
- * E-mail:
| | - Ulises F. Toscanini
- Primer Centro Argentino de Inmunogenética (PRICAI), Fundación Favaloro, Buenos Aires, Argentina
| | - Irina F. Velázquez
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Cintia Llull
- Primer Centro Argentino de Inmunogenética (PRICAI), Fundación Favaloro, Buenos Aires, Argentina
| | - Gabriela L. Berardi
- Primer Centro Argentino de Inmunogenética (PRICAI), Fundación Favaloro, Buenos Aires, Argentina
| | - Alfredo Holley
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Camila Tamburrini
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Sergio Avena
- Instituto de Ciencias Antropológicas (ICA), Facultad de Filosofía y Letras, Universidad de Buenos Aires, Buenos Aires, Argentina
- Centro de Estudios Biomédicos, Biotecnológicos, Ambientales y Diagnóstico (CEBBAD), Universidad Maimónides, Buenos Aires, Argentina
| | - Francisco R. Carnese
- Instituto de Ciencias Antropológicas (ICA), Facultad de Filosofía y Letras, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - José L. Lanata
- Instituto de Investigaciones en Diversidad Cultural y Procesos de Cambio (IIDyPCa), CONICET-UNRN, San Carlos de Bariloche, Argentina
| | - Noela Sánchez Carnero
- Centro para el Estudio de Sistemas Marinos (CECIMAR), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Lucas F. Arce
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Néstor G. Basso
- Instituto de Diversidad y Evolución Austral (IDEAus), CCT CONICET-CENPAT, Puerto Madryn, Argentina
| | - Rui Pereira
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| | - Leonor Gusmão
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| |
Collapse
|
19
|
YAŞAR BİLGE NŞ, SARI İ, SOLMAZ D, ŞENEL S, EMMUNGİL H, KILIÇ L, YILMAZ ÖNER S, YILDIZ F, YILMAZ S, ERSÖZLU BOZKIRLI D, AYDIN TUFAN M, YILMAZ S, YAZISIZ V, PEHLİVAN Y, BES C, YILDIRIM ÇETİN G, ERTEN Ş, GÖNÜLLÜ E, ŞAHİN F, AKAR S, AKSU K, KALYONCU U, DİRESKENELİ H, ERKEN E, KISACIK B, SAYARLIOGLU M, ÇINAR M, KAŞİFOĞLU T. The distribution of MEFV mutations in Turkish FMF patients:
multicenter study representing results of Anatolia. Turk J Med Sci 2019; 49:472-477. [PMID: 30887796 PMCID: PMC7018361 DOI: 10.3906/sag-1809-100] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background/aim The distribution of Mediterranean fever (MEFV) gene mutations in Turkish familial Mediterranean fever (FMF)
patients varies according to geographic area of Turkey. There is a need for highly representative data for Turkish FMF patients. The aim
of our study was to investigate the distribution of the common MEFV mutations in Turkish FMF patients in a nationwide, multicenter
study. Materials and methods Data of the 2246 FMF patients, from 15 adult rheumatology clinics located in different parts of the country,
were evaluated retrospectively. The following mutations have been tested in all patients: M694V, M680I, M694I, V726A, and E148Q. Results There were 1719 FMF patients with available genetic testing. According to the genotyping, homozygous M694V, present in
413 patients (24%), was the most common mutation . One hundred and fifty-four (9%) of patients had no detectable mutations. Allele
frequencies of common mutations were: M694V (n = 1529, 44.5%), M680I (n = 423, 12.3%), V726A (n = 315, 9.2%), E148Q (n = 214,
1%), and M694I (n = 12, <1%). Conclusion In this large-scale multicenter study, we provided information about the frequencies of common MEFV gene mutations
obtained from adult Turkish FMF patients. Nearly half of the patients were carrying at least one M694V mutations in their alleles.
Collapse
Affiliation(s)
- N. Şule YAŞAR BİLGE
- Division of Rheumatology, Department of Internal Medicine, Eskişehir Osmangazi University, EskişehirTurkey
- * To whom correspondence should be addressed. E-mail:
| | - İsmail SARI
- Division of Rheumatology, Department of Internal Medicine, Dokuz Eylül University, İzmirTurkey
| | - Dilek SOLMAZ
- Division of Rheumatology, Department of Internal Medicine, Dokuz Eylül University, İzmirTurkey
| | - Soner ŞENEL
- Division of Rheumatology, Department of Internal Medicine, Erciyes University, KayseriTurkey
| | - Hakan EMMUNGİL
- Division of Rheumatology, Department of Internal Medicine, Ege University, İzmirTurkey
| | - Levent KILIÇ
- Division of Rheumatology, Department of Internal Medicine, Hacettepe University, AnkaraTurkey
| | - Sibel YILMAZ ÖNER
- Division of Rheumatology, Department of Internal Medicine, Marmara University, İstanbulTurkey
| | - Fatih YILDIZ
- Division of Rheumatology, Department of Internal Medicine, Çukurova University, AdanaTurkey
| | - Sedat YILMAZ
- Division of Rheumatology, Department of Internal Medicine, University of Health Sciences,Gülhane Faculty of Medicine, AnkaraTurkey
| | - Duygu ERSÖZLU BOZKIRLI
- Division of Rheumatology, Department of Internal Medicine, Adana Numune Education and Research Hospital, AdanaTurkey
| | - Müge AYDIN TUFAN
- Division of Rheumatology, Department of Internal Medicine, Adana Numune Education and Research Hospital, AdanaTurkey
| | - Sema YILMAZ
- Division of Rheumatology, Department of Internal Medicine, Selçuk University, KonyaTurkey
| | - Veli YAZISIZ
- Division of Rheumatology, Department of Internal Medicine, Şişli Etfal Education and Research Hospital, İstanbulTurkey
| | - Yavuz PEHLİVAN
- Division of Rheumatology, Department of Internal Medicine, Gaziantep University, GaziantepTurkey
| | - Cemal BES
- Division of Rheumatology, Department of Internal Medicine, Abant İzzet Baysal University, BoluTurkey
| | - Gözde YILDIRIM ÇETİN
- Division of Rheumatology, Department of Internal Medicine, Kahramanmaraş Sütçü İmam University, KahramanmaraşTurkey
| | - Şükran ERTEN
- Division of Rheumatology, Department of Internal Medicine, Yıldırım Beyazıt University, AnkaraTurkey
| | - Emel GÖNÜLLÜ
- Division of Rheumatology, Department of Internal Medicine, Eskişehir Osmangazi University, EskişehirTurkey
| | - Fezan ŞAHİN
- Department of Biostatistics, Eskişehir Osmangazi University, EskişehirTurkey
| | - Servet AKAR
- Division of Rheumatology, Department of Internal Medicine, Dokuz Eylül University, İzmirTurkey
| | - Kenan AKSU
- Division of Rheumatology, Department of Internal Medicine, Ege University, İzmirTurkey
| | - Umut KALYONCU
- Division of Rheumatology, Department of Internal Medicine, Hacettepe University, AnkaraTurkey
| | - Haner DİRESKENELİ
- Division of Rheumatology, Department of Internal Medicine, Marmara University, İstanbulTurkey
| | - Eren ERKEN
- Division of Rheumatology, Department of Internal Medicine, Çukurova University, AdanaTurkey
| | - Bünyamin KISACIK
- Division of Rheumatology, Department of Internal Medicine, Medical Park, GaziantepTurkey
| | - Mehmet SAYARLIOGLU
- Division of Rheumatology, Department of Internal Medicine, Kahramanmaraş Sütçü İmam University, KahramanmaraşTurkey
| | - Muhammed ÇINAR
- Division of Rheumatology, Department of Internal Medicine, University of Health Sciences,Gülhane Faculty of Medicine, AnkaraTurkey
| | - Timuçin KAŞİFOĞLU
- Division of Rheumatology, Department of Internal Medicine, Eskişehir Osmangazi University, EskişehirTurkey
| |
Collapse
|
20
|
Kuzenkov O, Morozov A. Towards the Construction of a Mathematically Rigorous Framework for the Modelling of Evolutionary Fitness. Bull Math Biol 2019; 81:4675-4700. [PMID: 30949887 PMCID: PMC6874698 DOI: 10.1007/s11538-019-00602-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Accepted: 03/22/2019] [Indexed: 11/26/2022]
Abstract
Modelling of natural selection in self-replicating systems has been heavily influenced by the concept of fitness which was inspired by Darwin’s original idea of the survival of the fittest. However, so far the concept of fitness in evolutionary modelling is still somewhat vague, intuitive and often subjective. Unfortunately, as a result of this, using different definitions of fitness can lead to conflicting evolutionary outcomes. Here we formalise the definition of evolutionary fitness to describe the selection of strategies in deterministic self-replicating systems for generic modelling settings which involve an arbitrary function space of inherited strategies. Our mathematically rigorous definition of fitness is closely related to the underlying population dynamic equations which govern the selection processes. More precisely, fitness is defined based on the concept of the ranking of competing strategies which compares the long-term dynamics of measures of sets of inherited units in the space of strategies. We also formulate the variational principle of modelling selection which states that in a self-replicating system with inheritance, selection will eventually maximise evolutionary fitness. We demonstrate how expressions for evolutionary fitness can be derived for a class of models with age structuring including systems with delay, which has previously been considered as a challenge.
Collapse
Affiliation(s)
- Oleg Kuzenkov
- Lobachevsky State University of Nizhni Novgorod, Nizhniy Novgorod, Russia
| | - Andrew Morozov
- Department of Mathematics, University of Leicester, Leicester, UK.
- Shirshov Institute of Oceanology, Moscow, Russia.
| |
Collapse
|
21
|
Wientjes YCJ, Calus MPL, Duenk P, Bijma P. Required properties for markers used to calculate unbiased estimates of the genetic correlation between populations. Genet Sel Evol 2018; 50:65. [PMID: 30547748 PMCID: PMC6295113 DOI: 10.1186/s12711-018-0434-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 11/28/2018] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Generally, populations differ in terms of environmental and genetic factors, which can create differences in allele substitution effects between populations. Therefore, a single genotype may have different additive genetic values in different populations. The correlation between the two additive genetic values of a single genotype in two populations is known as the additive genetic correlation between populations and thus, can differ from 1. Our objective was to investigate whether differences in linkage disequilibrium (LD) and allele frequencies of markers and causal loci between populations affect the bias of the estimated genetic correlation. We simulated two populations that were separated by 50 generations and differed in LD pattern between markers and causal loci, as measured by the LD-statistic r. We used a high marker density to represent a high consistency of LD between populations, and lower marker densities to represent situations with a lower consistency of LD between populations. Markers and causal loci were selected to have either similar or different allele frequencies in the two populations. RESULTS Our results show that genetic correlations were underestimated only slightly when the difference in allele frequencies between the two populations was similar for the markers and the causal loci. A lower marker density, representing a lower consistency of LD between populations, had only a minor effect on the underestimation of the genetic correlation. When the difference in allele frequencies between the two populations was not similar for markers and causal loci, genetic correlations were severely underestimated. This bias occurred because the markers did not predict accurately the relationships at causal loci. CONCLUSIONS For an unbiased estimation of the genetic correlation between populations, the markers should accurately predict the relationships at the causal loci. To achieve this, it is essential that the difference in allele frequencies between populations is similar for markers and causal loci. Our results show that differences in LD phase between causal loci and markers across populations have little effect on the estimated genetic correlation.
Collapse
Affiliation(s)
- Yvonne C. J. Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, 6700 AH Wageningen, The Netherlands
| | - Mario P. L. Calus
- Animal Breeding and Genomics, Wageningen University and Research, 6700 AH Wageningen, The Netherlands
| | - Pascal Duenk
- Animal Breeding and Genomics, Wageningen University and Research, 6700 AH Wageningen, The Netherlands
| | - Piter Bijma
- Animal Breeding and Genomics, Wageningen University and Research, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
22
|
Abstract
Maximum avoidance of inbreeding (MAI) is a mating system, in which mates are as distantly related as possible. Although theoretical aspects and applications of MAI in diploid populations have been studied by many researchers, extension of MAI to haplodiploid populations is an unresolved problem. In this paper, this problem is addressed, and the following conclusions are derived. For a haplodiploid population with a Fibonacci number of females, a set of mating systems (one cycle MAI-hd) to avoid inbreeding to the maximum after one cycle practice of the set can be defined. But unlike MAI in diploid populations, repetition of one cycle MAI-hd cannot be MAI in the global range of generations. Numerical comparison with random mating and circular half-sib mating shows that as in diploid populations, repetition of one cycle MAI-hd in haplodiploid populations attains a lower inbreeding coefficient in early generations at the expense of a higher asymptotic rate of inbreeding.
Collapse
Affiliation(s)
- Tetsuro Nomura
- Department of Bioresource and Environmental Sciences, Faculty of Life Sciences, Kyoto Sangyo University, Kyoto 603-8555, Japan.
| |
Collapse
|
23
|
Abstract
We study the fixation probability of a mutant type when introduced into a resident population. We implement a stochastic competitive Lotka-Volterra model with two types and intra- and interspecific competition. The model further allows for stochastically varying population sizes. The competition coefficients are interpreted in terms of inverse payoffs emerging from an evolutionary game. Since our study focuses on the impact of the competition values, we assume the same net growth rate for both types. In this general framework, we derive a formula for the fixation probability [Formula: see text] of the mutant type under weak selection. We find that the most important parameter deciding over the invasion success of the mutant is its death rate due to competition with the resident. Furthermore, we compare our approximation to results obtained by implementing population size changes deterministically in order to explore the parameter regime of validity of our method. Finally, we put our formula in the context of classical evolutionary game theory and observe similarities and differences to the results obtained in that constant population size setting.
Collapse
Affiliation(s)
- Peter Czuppon
- Department of Evolutionary Theory, Max-Planck Institute for Evolutionary Biology, Plön, Germany
| | - Arne Traulsen
- Department of Evolutionary Theory, Max-Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
24
|
Strillacci MG, Gorla E, Cozzi MC, Vevey M, Genova F, Scienski K, Longeri M, Bagnato A. A copy number variant scan in the autochthonous Valdostana Red Pied cattle breed and comparison with specialized dairy populations. PLoS One 2018; 13:e0204669. [PMID: 30261013 PMCID: PMC6160104 DOI: 10.1371/journal.pone.0204669] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 09/12/2018] [Indexed: 11/24/2022] Open
Abstract
Copy number variants (CNVs) are an important source of genomic structural variation, recognized to influence phenotypic variation in many species. Many studies have focused on identifying CNVs within and between human and livestock populations alike, but only few have explored population-genetic properties in cattle based on CNVs derived from a high-density SNP array. We report a high-resolution CNV scan using Illumina’s 777k BovineHD Beadchip for Valdostana Red Pied (VRP), an autochthonous Italian dual-purpose cattle population reared in the Alps that did not undergo strong selection for production traits. After stringent quality control and filtering, CNVs were called across 108 bulls using the PennCNV software. A total of 6,784 CNVs were identified, summarized to 1,723 CNV regions (CNVRs) on 29 autosomes covering a total of ~59 Mb of the UMD3.1 assembly. Among the mapped CNVRs, there were 812 losses, 832 gains and 79 complexes. We subsequently performed a comparison of CNVs detected in the VRP and those available from published studies in the Italian Brown Swiss (IBS) and Mexican Holstein (HOL). A total of 171 CNVRs were common to all three breeds. Between VRP and IBS, 474 regions overlapped, while only 313 overlapped between VRP and HOL, indicating a more similar genetic background among populations with common origins, i.e. the Alps. The principal component, clustering and admixture analyses showed a clear separation of the three breeds into three distinct clusters. In order to describe the distribution of CNVs within and among breeds we used the pair VST statistic, considering only the CNVRs shared to more than 5 individuals (within breed). We identified unique and highly differentiated CNVs (n = 33), some of which could be due to specific breed selection and adaptation. Genes and QTL within these regions were characterized.
Collapse
Affiliation(s)
| | - Erica Gorla
- Department of Veterinary Medicine, University of Milan, Milan, Italy
| | | | - Mario Vevey
- Associazione Nazionale Allevatori Bovini Di Razza Valdostana, Gressan, Aosta, Italy
| | - Francesca Genova
- Department of Veterinary Medicine, University of Milan, Milan, Italy
| | - Kathy Scienski
- Department of Animal Science, Texas A&M University, College Station, Texas, United States of America
| | - Maria Longeri
- Department of Veterinary Medicine, University of Milan, Milan, Italy
| | | |
Collapse
|
25
|
Abstract
A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure. The method estimates ancestry proportions for each sample from a set of two-dimensional population layers, and, within each layer, estimates a rate at which relatedness decays with distance. This thereby explicitly addresses the "clines versus clusters" problem in modeling population genetic variation, and remedies some of the overfitting to which nonspatial models are prone. The method produces useful descriptions of structure in genetic relatedness in situations where separated, geographically distributed populations interact, as after a range expansion or secondary contact. We demonstrate the utility of this approach using simulations and by applying it to empirical datasets of poplars and black bears in North America.
Collapse
Affiliation(s)
- Gideon S Bradburd
- Ecology, Evolutionary Biology, and Behavior Graduate Group, Department of Integrative Biology, Michigan State University, East Lansing, Michigan 48824
| | - Graham M Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Peter L Ralph
- Institute of Ecology and Evolution, Departments of Mathematics and Biology, University of Oregon, Eugene, Oregon 97403
| |
Collapse
|
26
|
Abstract
The mutation-selection process is the most fundamental mechanism of evolution. In 1935, R. A. Fisher proved his fundamental theorem of natural selection, providing a model in which the rate of change of mean fitness is equal to the genetic variance of a species. Fisher did not include mutations in his model, but believed that mutations would provide a continual supply of variance resulting in perpetual increase in mean fitness, thus providing a foundation for neo-Darwinian theory. In this paper we re-examine Fisher's Theorem, showing that because it disregards mutations, and because it is invalid beyond one instant in time, it has limited biological relevance. We build a differential equations model from Fisher's first principles with mutations added, and prove a revised theorem showing the rate of change in mean fitness is equal to genetic variance plus a mutational effects term. We refer to our revised theorem as the fundamental theorem of natural selection with mutations. Our expanded theorem, and our associated analyses (analytic computation, numerical simulation, and visualization), provide a clearer understanding of the mutation-selection process, and allow application of biologically realistic parameters such as mutational effects. The expanded theorem has biological implications significantly different from what Fisher had envisioned.
Collapse
Affiliation(s)
- William F Basener
- Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY, 14623, USA.
| | - John C Sanford
- Horticulture Section, NYSAES, 630 West North Street, Geneva, New York, 14456, USA
| |
Collapse
|
27
|
Key FM, Abdul-Aziz MA, Mundry R, Peter BM, Sekar A, D'Amato M, Dennis MY, Schmidt JM, Andrés AM. Human local adaptation of the TRPM8 cold receptor along a latitudinal cline. PLoS Genet 2018; 14:e1007298. [PMID: 29723195 PMCID: PMC5933706 DOI: 10.1371/journal.pgen.1007298] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 03/07/2018] [Indexed: 01/22/2023] Open
Abstract
Ambient temperature is a critical environmental factor for all living organisms. It was likely an important selective force as modern humans recently colonized temperate and cold Eurasian environments. Nevertheless, as of yet we have limited evidence of local adaptation to ambient temperature in populations from those environments. To shed light on this question, we exploit the fact that humans are a cosmopolitan species that inhabit territories under a wide range of temperatures. Focusing on cold perception-which is central to thermoregulation and survival in cold environments-we show evidence of recent local adaptation on TRPM8. This gene encodes for a cation channel that is, to date, the only temperature receptor known to mediate an endogenous response to moderate cold. The upstream variant rs10166942 shows extreme population differentiation, with frequencies that range from 5% in Nigeria to 88% in Finland (placing this SNP in the 0.02% tail of the FST empirical distribution). When all populations are jointly analyzed, allele frequencies correlate with latitude and temperature beyond what can be explained by shared ancestry and population substructure. Using a Bayesian approach, we infer that the allele originated and evolved neutrally in Africa, while positive selection raised its frequency to different degrees in Eurasian populations, resulting in allele frequencies that follow a latitudinal cline. We infer strong positive selection, in agreement with ancient DNA showing high frequency of the allele in Europe 3,000 to 8,000 years ago. rs10166942 is important phenotypically because its ancestral allele is protective of migraine. This debilitating disorder varies in prevalence across human populations, with highest prevalence in individuals of European descent-precisely the population with the highest frequency of rs10166942 derived allele. We thus hypothesize that local adaptation on previously neutral standing variation may have contributed to the genetic differences that exist in the prevalence of migraine among human populations today.
Collapse
Affiliation(s)
- Felix M Key
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Muslihudeen A Abdul-Aziz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Roger Mundry
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Aarthi Sekar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, California, United States of America
| | - Mauro D'Amato
- BioDonostia Health Research Institute and IKERBASQUE, Basque Foundation for Science, San Sebastian, Spain
| | - Megan Y Dennis
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, California, United States of America
| | - Joshua M Schmidt
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Aida M Andrés
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
28
|
Kim JS, Gao X, Rzhetsky A. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning. PLoS Comput Biol 2018; 14:e1006106. [PMID: 29698408 PMCID: PMC5940243 DOI: 10.1371/journal.pcbi.1006106] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 05/08/2018] [Accepted: 03/20/2018] [Indexed: 11/18/2022] Open
Abstract
Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.
Collapse
Affiliation(s)
- Ji-Sung Kim
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Andrey Rzhetsky
- Institute for Genomics and Systems Biology, Computation Institute, Departments of Medicine and Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
29
|
Lemes RB, Nunes K, Carnavalli JEP, Kimura L, Mingroni-Netto RC, Meyer D, Otto PA. Inbreeding estimates in human populations: Applying new approaches to an admixed Brazilian isolate. PLoS One 2018; 13:e0196360. [PMID: 29689090 PMCID: PMC5916862 DOI: 10.1371/journal.pone.0196360] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 04/11/2018] [Indexed: 02/06/2023] Open
Abstract
The analysis of genomic data (~400,000 autosomal SNPs) enabled the reliable estimation of inbreeding levels in a sample of 541 individuals sampled from a highly admixed Brazilian population isolate (an African-derived quilombo in the State of São Paulo). To achieve this, different methods were applied to the joint information of two sets of markers (one complete and another excluding loci in patent linkage disequilibrium). This strategy allowed the detection and exclusion of markers that biased the estimation of the average population inbreeding coefficient (Wright's fixation index FIS), which value was eventually estimated as around 1% using any of the methods we applied. Quilombo demographic inferences were made by analyzing the structure of runs of homozygosity (ROH), which were adapted to cope with a highly admixed population with a complex foundation history. Our results suggest that the amount of ROH <2Mb of admixed populations should be somehow proportional to the genetic contribution from each parental population.
Collapse
Affiliation(s)
- Renan B. Lemes
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Kelly Nunes
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Juliana E. P. Carnavalli
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Lilian Kimura
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Regina C. Mingroni-Netto
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Diogo Meyer
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| | - Paulo A. Otto
- Department of Genetics and Evolutionary Biology, Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
- * E-mail:
| |
Collapse
|
30
|
Abstract
We study Eigen's quasispecies model in the asymptotic regime where the length of the genotypes goes to [Formula: see text] and the mutation probability goes to 0. A limiting infinite system of differential equations is obtained. We prove convergence of trajectories, as well as convergence of the equilibrium solutions. We give analogous results for a discrete-time version of Eigen's model, which coincides with a model proposed by Moran.
Collapse
|
31
|
Soraggi S, Wiuf C, Albrechtsen A. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data. G3 (Bethesda) 2018; 8:551-566. [PMID: 29196497 PMCID: PMC5919751 DOI: 10.1534/g3.117.300192] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 11/27/2017] [Indexed: 02/07/2023]
Abstract
The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.
Collapse
Affiliation(s)
- Samuele Soraggi
- Department of Mathematical Sciences, Faculty of Science, University of Copenhagen, 2100, Denmark
| | - Carsten Wiuf
- Department of Mathematical Sciences, Faculty of Science, University of Copenhagen, 2100, Denmark
| | - Anders Albrechtsen
- Center for Bioinformatics, Faculty of Science, University of Copenhagen, 2100, Denmark
| |
Collapse
|
32
|
Fish AE, Crawford DC, Capra JA, Bush WS. Local ancestry transitions modify snp-trait associations. Pac Symp Biocomput 2018; 23:424-435. [PMID: 29218902 PMCID: PMC5728664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Genomic maps of local ancestry identify ancestry transitions - points on a chromosome where recent recombination events in admixed individuals have joined two different ancestral haplotypes. These events bring together alleles that evolved within separate continential populations, providing a unique opportunity to evaluate the joint effect of these alleles on health outcomes. In this work, we evaluate the impact of genetic variants in the context of nearby local ancestry transitions within a sample of nearly 10,000 adults of African ancestry with traits derived from electronic health records. Genetic data was located using the Metabochip, and used to derive local ancestry. We develop a model that captures the effect of both single variants and local ancestry, and use it to identify examples where local ancestry transitions significantly interact with nearby variants to influence metabolic traits. In our most compelling example, we find that the minor allele of rs16890640 occuring on a European background with a downstream local ancestry transition to African ancestry results in significantly lower mean corpuscular hemoglobin and volume. This finding represents a new way of discovering genetic interactions, and is supported by molecular data that suggest changes to local ancestry may impact local chromatin looping.
Collapse
Affiliation(s)
- Alexandra E Fish
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37235, USA, ²Departments of Biological Sciences, Biomedical Informatics, and Computer Science, Vanderbilt University, Nashville, TN 37235, USA,
| | | | | | | |
Collapse
|
33
|
Abstract
A major topic of interest in human prehistory is how the large-scale genetic structure of modern populations outside of Africa was established. Demographic models have been developed that capture the relationships among small numbers of populations or within particular geographical regions, but constructing a phylogenetic tree with gene flow events for a wide diversity of non-Africans remains a difficult problem. Here, we report a model that provides a good statistical fit to allele-frequency correlation patterns among East Asians, Australasians, Native Americans, and ancient western and northern Eurasians, together with archaic human groups. The model features a primary eastern/western bifurcation dating to at least 45,000 years ago, with Australasians nested inside the eastern clade, and a parsimonious set of admixture events. While our results still represent a simplified picture, they provide a useful summary of deep Eurasian population history that can serve as a null model for future studies and a baseline for further discoveries.
Collapse
Affiliation(s)
- Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, MA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
34
|
Chen SY, Deng F, Huang Y, Li C, Liu L, Jia X, Lai SJ. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R. PLoS One 2016; 11:e0165434. [PMID: 27792763 PMCID: PMC5085088 DOI: 10.1371/journal.pone.0165434] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 10/11/2016] [Indexed: 11/19/2022] Open
Abstract
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.
Collapse
Affiliation(s)
- Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Feilong Deng
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Ying Huang
- College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Cao Li
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Linhai Liu
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xianbo Jia
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Song-Jia Lai
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| |
Collapse
|
35
|
Nielsen R. Fumio Tajima and the Origin of Modern Population Genetics. Genetics 2016; 204:389-390. [PMID: 27729488 PMCID: PMC5068832 DOI: 10.1534/genetics.116.195271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023] Open
Affiliation(s)
- Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, California 94720 andNatural History Museum of Denmark, 1350 Copenhagen, Denmark
| |
Collapse
|
36
|
Robinson MR, Hemani G, Medina-Gomez C, Mezzavilla M, Esko T, Shakhbazov K, Powell JE, Vinkhuyzen A, Berndt SI, Gustafsson S, Justice AE, Kahali B, Locke AE, Pers TH, Vedantam S, Wood AR, van Rheenen W, Andreassen OA, Gasparini P, Metspalu A, Berg LHVD, Veldink JH, Rivadeneira F, Werge TM, Abecasis GR, Boomsma DI, Chasman DI, de Geus EJC, Frayling TM, Hirschhorn JN, Hottenga JJ, Ingelsson E, Loos RJF, Magnusson PKE, Martin NG, Montgomery GW, North KE, Pedersen NL, Spector TD, Speliotes EK, Goddard ME, Yang J, Visscher PM. Population genetic differentiation of height and body mass index across Europe. Nat Genet 2015; 47:1357-62. [PMID: 26366552 PMCID: PMC4984852 DOI: 10.1038/ng.3401] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 08/19/2015] [Indexed: 12/13/2022]
Abstract
Across-nation differences in the mean values for complex traits are common, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10(-8); BMI, P < 5.95 × 10(-4)), and we find an among-population genetic correlation for tall and slender individuals (r = -0.80, 95% CI = -0.95, -0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).
Collapse
Affiliation(s)
- Matthew R Robinson
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Gibran Hemani
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Carolina Medina-Gomez
- Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Massimo Mezzavilla
- Institute for Maternal and Child Health, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) 'Burlo Garofolo', Trieste, Italy
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Tonu Esko
- Estonian Genome Center, University of Tartu, Tartu, Estonia
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Konstantin Shakhbazov
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Joseph E Powell
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- University of Queensland Diamantina Institute, University of Queensland, Translational Research Institute, Brisbane, Queensland, Australia
| | - Anna Vinkhuyzen
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA
| | - Stefan Gustafsson
- Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Anne E Justice
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Bratati Kahali
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Adam E Locke
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Tune H Pers
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Sailaja Vedantam
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Andrew R Wood
- Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Wouter van Rheenen
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Ole A Andreassen
- Norwegian Centre for Mental Disorders Research (NORMENT), KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Paolo Gasparini
- Institute for Maternal and Child Health, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) 'Burlo Garofolo', Trieste, Italy
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | | | - Leonard H van den Berg
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Jan H Veldink
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Fernando Rivadeneira
- Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Thomas M Werge
- Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Devices Copenhagen, Roskilde, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, (iPSYCH), Aarhus, Denmark
| | - Goncalo R Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Dorret I Boomsma
- Neuroscience Campus Amsterdam, VU University Medical Center, Amsterdam, the Netherlands
- EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, the Netherlands
| | - Daniel I Chasman
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Eco J C de Geus
- Neuroscience Campus Amsterdam, VU University Medical Center, Amsterdam, the Netherlands
- EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, the Netherlands
| | - Timothy M Frayling
- Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Joel N Hirschhorn
- Estonian Genome Center, University of Tartu, Tartu, Estonia
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jouke Jan Hottenga
- Neuroscience Campus Amsterdam, VU University Medical Center, Amsterdam, the Netherlands
- EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, the Netherlands
| | - Erik Ingelsson
- Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Ruth J F Loos
- Medical Research Council (MRC) Epidemiology Unit, University of Cambridge, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Patrik K E Magnusson
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Nicholas G Martin
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Grant W Montgomery
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Kari E North
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
- Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Nancy L Pedersen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Timothy D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, St. Thomas' Hospital, London, UK
| | - Elizabeth K Speliotes
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Michael E Goddard
- Biosciences Research Division, Department of Primary Industries, Melbourne, Victoria, Australia
- Department of Food and Agricultural Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Jian Yang
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- University of Queensland Diamantina Institute, University of Queensland, Translational Research Institute, Brisbane, Queensland, Australia
| | - Peter M Visscher
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- University of Queensland Diamantina Institute, University of Queensland, Translational Research Institute, Brisbane, Queensland, Australia
| |
Collapse
|
37
|
Risso D, Taglioli L, De Iasio S, Gueresi P, Alfani G, Nelli S, Rossi P, Paoli G, Tofanelli S. Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach. PLoS One 2015; 10:e0140146. [PMID: 26452043 PMCID: PMC4599962 DOI: 10.1371/journal.pone.0140146] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 09/21/2015] [Indexed: 11/18/2022] Open
Abstract
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447-2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8-30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.
Collapse
Affiliation(s)
- Davide Risso
- National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20854, United States of America
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of BiGeA, University of Bologna, via Selmi 3, 40126 Bologna, Italy
| | - Luca Taglioli
- Dipartimento di Biologia, University of Pisa, Via Ghini 13, 56126 Pisa, Italy
| | - Sergio De Iasio
- Dipartimento di Genetica Biologia dei Microrganismi Antropologia Evoluzione, University of Parma, Parco Area delle Scienze 11/a, 43124 Parma, Italy
| | - Paola Gueresi
- Dipartimento di Scienze Statistiche, University of Bologna, Via Belle Arti 41, 40126 Bologna, Italy
| | - Guido Alfani
- Bocconi University, Dondena Centre and IGIER, Milan, Italy
| | | | - Paolo Rossi
- Dipartimento di Fisica, University of Pisa, Largo Bruno Pontecorvo 3, 56127 Pisa, Italy
| | - Giorgio Paoli
- Dipartimento di Biologia, University of Pisa, Via Ghini 13, 56126 Pisa, Italy
| | - Sergio Tofanelli
- Dipartimento di Biologia, University of Pisa, Via Ghini 13, 56126 Pisa, Italy
- * E-mail:
| |
Collapse
|
38
|
Balick DJ, Do R, Cassa CA, Reich D, Sunyaev SR. Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 2015; 11:e1005436. [PMID: 26317225 PMCID: PMC4552954 DOI: 10.1371/journal.pgen.1005436] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 07/09/2015] [Indexed: 11/30/2022] Open
Abstract
Population bottlenecks followed by re-expansions have been common throughout history of many populations. The response of alleles under selection to such demographic perturbations has been a subject of great interest in population genetics. On the basis of theoretical analysis and computer simulations, we suggest that this response qualitatively depends on dominance. The number of dominant or additive deleterious alleles per haploid genome is expected to be slightly increased following the bottleneck and re-expansion. In contrast, the number of completely or partially recessive alleles should be sharply reduced. Changes of population size expose differences between recessive and additive selection, potentially providing insight into the prevalence of dominance in natural populations. Specifically, we use a simple statistic, BR≡∑xipop1/∑xjpop2, where xi represents the derived allele frequency, to compare the number of mutations in different populations, and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. We also provide empirical evidence showing that gene sets associated with autosomal recessive disease in humans may have a BR indicative of recessive selection. Together, these theoretical predictions and empirical observations show that complex demographic history may facilitate rather than impede inference of parameters of natural selection. Dominance has played a central role in classical genetics since its inception. However, the effect of dominance introduces substantial technical complications into theoretical models describing dynamics of alleles in populations. As a result, dominance is often ignored in population genetic models. Statistical tests for selection built on these models do not discriminate between recessive and additive alleles. We show that historical changes in population size can provide a way to differentiate between recessive and additive selection. Our analysis compares two sub-populations with different demographic histories. History of our own species provides plenty of examples of sub-populations that went through population bottlenecks followed by re-expansions. We show that demographic differences, which generally complicate the analysis, can instead aid in the inference of features of natural selection.
Collapse
Affiliation(s)
- Daniel J. Balick
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Ron Do
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Christopher A. Cassa
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - David Reich
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Shamil R. Sunyaev
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
39
|
Abstract
BACKGROUND The isolation with migration (IM) model is important for studies in population genetics and phylogeography. IM program applies the IM model to genetic data drawn from a pair of closely related populations or species based on Markov chain Monte Carlo (MCMC) simulations of gene genealogies. But computational burden of IM program has placed limits on its application. METHODOLOGY With strong computational power, Graphics Processing Unit (GPU) has been widely used in many fields. In this article, we present an effective implementation of IM program on one GPU based on Compute Unified Device Architecture (CUDA), which we call gPGA. CONCLUSIONS Compared with IM program, gPGA can achieve up to 52.30X speedup on one GPU. The evaluation results demonstrate that it allows datasets to be analyzed effectively and rapidly for research on divergence population genetics. The software is freely available with source code at https://github.com/chunbaozhou/gPGA.
Collapse
Affiliation(s)
- Chunbao Zhou
- Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Xianyu Lang
- Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yangang Wang
- Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
- * E-mail: (YGW); (CDZ)
| | - Chaodong Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- * E-mail: (YGW); (CDZ)
| |
Collapse
|
40
|
Abstract
We consider an individual based model of phenotypic evolution in hermaphroditic populations which includes random and assortative mating of individuals. By increasing the number of individuals to infinity we obtain a nonlinear transport equation, which describes the evolution of phenotypic distribution. The main result of the paper is a theorem on asymptotic stability of trait distribution. This theorem is applied to models with the offspring trait distribution given by additive and multiplicative random perturbations of the parental mean trait.
Collapse
Affiliation(s)
- Ryszard Rudnicki
- Institute of Mathematics, Polish Academy of Sciences, Bankowa 14, 40-007 Katowice, Poland
| | - Paweł Zwoleński
- Institute of Mathematics, Polish Academy of Sciences, Bankowa 14, 40-007 Katowice, Poland
| |
Collapse
|
41
|
Yan S, Wang CC, Zheng HX, Wang W, Qin ZD, Wei LH, Wang Y, Pan XD, Fu WQ, He YG, Xiong LJ, Jin WF, Li SL, An Y, Li H, Jin L. Y chromosomes of 40% Chinese descend from three Neolithic super-grandfathers. PLoS One 2014; 9:e105691. [PMID: 25170956 PMCID: PMC4149484 DOI: 10.1371/journal.pone.0105691] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 07/24/2014] [Indexed: 12/21/2022] Open
Abstract
Demographic change of human populations is one of the central questions for delving into the past of human beings. To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region, discovered >4,000 new SNPs, and identified many new clades. The relative divergence dates can be estimated much more precisely using a molecular clock. We found that all the Paleolithic divergences were binary; however, three strong star-like Neolithic expansions at ∼6 kya (thousand years ago) (assuming a constant substitution rate of 1×10(-9)/bp/year) indicates that ∼40% of modern Chinese are patrilineal descendants of only three super-grandfathers at that time. This observation suggests that the main patrilineal expansion in China occurred in the Neolithic Era and might be related to the development of agriculture.
Collapse
Affiliation(s)
- Shi Yan
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
| | - Chuan-Chao Wang
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Hong-Xiang Zheng
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Wei Wang
- Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
| | - Zhen-Dong Qin
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Lan-Hai Wei
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Yi Wang
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Xue-Dong Pan
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Wen-Qing Fu
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Yun-Gang He
- Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
| | - Li-Jun Xiong
- Epigenetics Laboratory, Institute of Biomedical Sciences, Fudan University, Shanghai, China
| | - Wen-Fei Jin
- Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
| | - Shi-Lin Li
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Yu An
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Hui Li
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Chinese Academy of Sciences Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China
| |
Collapse
|
42
|
Shem-Tov D, Halperin E. Historical pedigree reconstruction from extant populations using PArtitioning of RElatives (PREPARE). PLoS Comput Biol 2014; 10:e1003610. [PMID: 24945698 PMCID: PMC4063675 DOI: 10.1371/journal.pcbi.1003610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 03/13/2014] [Indexed: 11/18/2022] Open
Abstract
Recent technological improvements in the field of genetic data extraction give rise to the possibility of reconstructing the historical pedigrees of entire populations from the genotypes of individuals living today. Current methods are still not practical for real data scenarios as they have limited accuracy and assume unrealistic assumptions of monogamy and synchronized generations. In order to address these issues, we develop a new method for pedigree reconstruction, , which is based on formulations of the pedigree reconstruction problem as variants of graph coloring. The new formulation allows us to consider features that were overlooked by previous methods, resulting in a reconstruction of up to 5 generations back in time, with an order of magnitude improvement of false-negatives rates over the state of the art, while keeping a lower level of false positive rates. We demonstrate the accuracy of compared to previous approaches using simulation studies over a range of population sizes, including inbred and outbred populations, monogamous and polygamous mating patterns, as well as synchronous and asynchronous mating. Learning the correct relationships between individuals from genetic data is a basic theoretical problem in the field of genetics, and has many practical consequences. A wide variety of statistical methods for genetic analysis assume the relationships between individuals are known, and can manifest relatedness information to improve inference. The current state-of-the-art methods for relationship inference consider pair-wise genetic similarity, and use it to infer the relationship between each pair of individuals. Reconstructing the pedigrees of an entire population directly has the potential to use more elaborate relationship information, and thus obtains a better prediction of the familial relationships in the population. In contrast to the full set of pair-wise relationships in a population, genetic pedigrees provide a lossless and conflict-free structure for depicting the relationships between individuals. In an effort to make pedigree reconstruction practical we developed a new method, which is an order of magnitude more accurate than previous methods, and is the first method that has the ability to reconstruct polygamous pedigrees.
Collapse
Affiliation(s)
- Doron Shem-Tov
- The Balvatnic School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- * E-mail:
| | - Eran Halperin
- The Balvatnic School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- International Computer Science Institute, Berkeley, California, United States of America
- Molecular Microbiology and Biotechnology Department, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
43
|
Lao O, Liu F, Wollstein A, Kayser M. GAGA: a new algorithm for genomic inference of geographic ancestry reveals fine level population substructure in Europeans. PLoS Comput Biol 2014; 10:e1003480. [PMID: 24586132 PMCID: PMC3930519 DOI: 10.1371/journal.pcbi.1003480] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 01/03/2014] [Indexed: 02/04/2023] Open
Abstract
Attempts to detect genetic population substructure in humans are troubled by the fact that the vast majority of the total amount of observed genetic variation is present within populations rather than between populations. Here we introduce a new algorithm for transforming a genetic distance matrix that reduces the within-population variation considerably. Extensive computer simulations revealed that the transformed matrix captured the genetic population differentiation better than the original one which was based on the T1 statistic. In an empirical genomic data set comprising 2,457 individuals from 23 different European subpopulations, the proportion of individuals that were determined as a genetic neighbour to another individual from the same sampling location increased from 25% with the original matrix to 52% with the transformed matrix. Similarly, the percentage of genetic variation explained between populations by means of Analysis of Molecular Variance (AMOVA) increased from 1.62% to 7.98%. Furthermore, the first two dimensions of a classical multidimensional scaling (MDS) using the transformed matrix explained 15% of the variance, compared to 0.7% obtained with the original matrix. Application of MDS with Mclust, SPA with Mclust, and GemTools algorithms to the same dataset also showed that the transformed matrix gave a better association of the genetic clusters with the sampling locations, and particularly so when it was used in the AMOVA framework with a genetic algorithm. Overall, the new matrix transformation introduced here substantially reduces the within population genetic differentiation, and can be broadly applied to methods such as AMOVA to enhance their sensitivity to reveal population substructure. We herewith provide a publically available (http://www.erasmusmc.nl/fmb/resources/GAGA) model-free method for improved genetic population substructure detection that can be applied to human as well as any other species data in future studies relevant to evolutionary biology, behavioural ecology, medicine, and forensics.
Collapse
Affiliation(s)
- Oscar Lao
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
- * E-mail:
| | - Fan Liu
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Andreas Wollstein
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | - Manfred Kayser
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| |
Collapse
|
44
|
MARTIN ALICIAR, TSE GERARD, BUSTAMANTE CARLOSD, KENNY EIMEARE. Imputation-based assessment of next generation rare exome variant arrays. Pac Symp Biocomput 2014:241-252. [PMID: 24297551 PMCID: PMC3900244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in the human genome are rare and found within single populations or lineages. These observations hold important implications for the design of the next round of disease variant discovery efforts-if genetic variants that influence disease risk follow the same trend, then we expect to see population-specific disease associations that require large sample sizes for detection. To address this challenge, and due to the still prohibitive cost of sequencing large cohorts, researchers have developed a new generation of low-cost genotyping arrays that assay rare variation previously identified from large exome sequencing studies. Genotyping approaches rely not only on directly observing variants, but also on phasing and imputation methods that use publicly available reference panels to infer unobserved variants in a study cohort. Rare variant exome arrays are intentionally enriched for variants likely to be disease causing, and here we assay the ability of the first commercially available rare exome variant array (the Illumina Infinium HumanExome BeadChip) to also tag other potentially damaging variants not molecularly assayed. Using full sequence data from chromosome 22 from the phase I 1000 Genomes Project, we evaluate three methods for imputation (BEAGLE, MaCH-Admix, and SHAPEIT2/IMPUTE2) with the rare exome variant array under varied study panel sizes, reference panel sizes, and LD structures via population differences. We find that imputation is more accurate across both the genome and exome for common variant arrays than the next generation array for all allele frequencies, including rare alleles. We also find that imputation is the least accurate in African populations, and accuracy is substantially improved for rare variants when the same population is included in the reference panel. Depending on the goals of GWAS researchers, our results will aid budget decisions by helping determine whether money is best spent sequencing the genomes of smaller sample sizes, genotyping larger sample sizes with rare and/or common variant arrays and imputing SNPs, or some combination of the two.
Collapse
Affiliation(s)
- ALICIA R. MARTIN
- Department of Genetics & Biomedical Informatics Training Program, Stanford University, Stanford, CA, 94305
| | - GERARD TSE
- Department of Computer Science, Stanford University, Stanford, CA, 94305
| | | | - EIMEAR E. KENNY
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| |
Collapse
|
45
|
Abreu-Głowacka M, Zaba C, Koralewska-Kordel M, Michalak E, Przybylski Z. [Polish population data for 17 Y-STRs and 8 Y-SNPs markers]. Arch Med Sadowej Kryminol 2013; 63:201-215. [PMID: 24672896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023] Open
Abstract
The aim of our study was to establish the genetic differentiation of the population of the province of Wielkopolska (Greater Poland) for 17 Y-STRs and 8 Y-SNPs and comparison of the Polish population with other selected populations. The investigations included 201 unrelated male inhabitants of the Greater Poland region We found 184 unique haplotypes for 17 Y-STR. The haplotype discrimination capacity was 0.96. The most frequent haplotype Ht-50 was found in 3 samples and 7 haplotypes observed twice. Further, the same samples were analyzed with Y-8 SNPs markers. We obtained 40 haplotypes. The haplotype discrimination capacity was 0.20. The most frequent haplotype was presented in 38 samples. A total of 4 different haplogroups were established. Haplogroup K= 19%, IJ = 7%, R1a1 = 59% and R1b = 15%. The HD value of Y-SNPs/Y-STRs was 0.9883.
Collapse
|
46
|
Taylor D, Bright JA, Buckleton J. The interpretation of single source and mixed DNA profiles. Forensic Sci Int Genet 2013; 7:516-28. [PMID: 23948322 DOI: 10.1016/j.fsigen.2013.05.011] [Citation(s) in RCA: 189] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Revised: 05/14/2013] [Accepted: 05/23/2013] [Indexed: 11/19/2022]
Abstract
A method for interpreting autosomal mixed DNA profiles based on continuous modelling of peak heights is described. MCMC is applied with a model for allelic and stutter heights to produce a probability for the data given a specified genotype combination. The theory extends to handle any number of contributors and replicates, although practical implementation limits analyses to four contributors. The probability of the peak data given a genotype combination has proven to be a highly intuitive probability that may be assessed subjectively by experienced caseworkers. Whilst caseworkers will not assess the probabilities per se, they can broadly judge genotypes that fit the observed data well, and those that fit relatively less well. These probabilities are used when calculating a subsequent likelihood ratio. The method has been trialled on a number of mixed DNA profiles constructed from known contributors. The results have been assessed against a binary approach and also compared with the subjective judgement of an analyst.
Collapse
Affiliation(s)
- Duncan Taylor
- Forensic Science South Australia, 21 Divett Place, Adelaide, SA 5000, Australia
| | | | | |
Collapse
|
47
|
Abstract
The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
Collapse
Affiliation(s)
- Pavlos Pavlidis
- Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg, Heidelberg, Germany.
| | | | | | | |
Collapse
|
48
|
Abstract
Approximate Bayesian computation has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the approximate Bayesian computation parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The Bayesian computation with empirical likelihood algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.
Collapse
Affiliation(s)
- Kerrie L. Mengersen
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD 4001, Australia
| | - Pierre Pudlo
- Centre de Biologie pour la Gestion des Populations, Institut National de la Recherche Agronomique, 34988 Montferrier-sur-Lez Cedex, France
- Université Montpellier 2, Institut de Mathématiques et de Modélisation de Montpellier, 34095 Montpellier Cedex 5, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Christian P. Robert
- Université Paris Dauphine, Centre de Recherche en Mathematiques de la Decision, 75775 Paris Cedex 16, France
- Institut Universitaire de France, Paris, France; and
- Centre de Recherche en Statistique et Economie, 92245 Malakoff Cedex, France
| |
Collapse
|
49
|
BUSH WILLIAMS, BOSTON JONATHAN, PENDERGRASS SARAHA, DUMITRESCU LOGAN, GOODLOE ROBERT, BROWN-GENTRY KRISTIN, WILSON SARAH, MCCLELLAN BOB, TORSTENSON ERIC, BASFORD MELISSAA, SPENCER KYLEEL, RITCHIE MARYLYND, CRAWFORD DANAC. Enabling high-throughput genotype-phenotype associations in the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project as part of the Population Architecture using Genomics and Epidemiology (PAGE) study. Pac Symp Biocomput 2013:373-84. [PMID: 23424142 PMCID: PMC3579641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
Collapse
Affiliation(s)
- WILLIAM S. BUSH
- Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland, Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - JONATHAN BOSTON
- Center for Human Genetics Research, Vanderbilt University, 1207 17 Avenue, Suite 300, Nashville, TN 37232, USA
| | - SARAH A. PENDERGRASS
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 503 Wartik Lab, University Park, PA 16802, USA
| | - LOGAN DUMITRESCU
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - ROBERT GOODLOE
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - KRISTIN BROWN-GENTRY
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - SARAH WILSON
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - BOB MCCLELLAN
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - ERIC TORSTENSON
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - MELISSA A. BASFORD
- Office of Research, Office of Personalized Medicine, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA
| | - KYLEE L. SPENCER
- Biology and Environmental Science, Heidelberg University, Bareis Hall 131, 310 East Market Street, Tiffin, OH 44883, USA
| | - MARYLYN D. RITCHIE
- Center for System Genomics, Department of Biochemistry and Molecular Biology,, Pennsylvania State University, 512 Wartik Lab, University Park, PA 16802, USA
| | - DANA C. CRAWFORD
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| |
Collapse
|
50
|
KOPELMAN NAAMAM, STONE LEWI, GASCUEL OLIVIER, ROSENBERG NOAHA. The behavior of admixed populations in neighbor-joining inference of population trees. Pac Symp Biocomput 2013:273-284. [PMID: 23424132 PMCID: PMC3597466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Neighbor-joining is one of the most widely used methods for constructing evolutionary trees. This approach from phylogenetics is often employed in population genetics, where distance matrices obtained from allele frequencies are used to produce a representation of population relationships in the form of a tree. In phylogenetics, the utility of neighbor-joining derives partly from a result that for a class of distance matrices including those that are additive or tree-like-generated by summing weights over the edges connecting pairs of taxa in a tree to obtain pairwise distances-application of neighbor-joining recovers exactly the underlying tree. For populations within a species, however, migration and admixture can produce distance matrices that reflect more complex processes than those obtained from the bifurcating trees typical in the multispecies context. Admixed populations-populations descended from recent mixture of groups that have long been separated-have been observed to be located centrally in inferred neighbor-joining trees, with short external branches incident to the path connecting their source populations. Here, using a simple model, we explore mathematically the behavior of an admixed population under neighbor-joining. We show that with an additive distance matrix, a population admixed among two source populations necessarily lies on the path between the sources. Relaxing the additivity requirement, we examine the smallest nontrivial case-four populations, one of which is admixed between two of the other three-showing that the two source populations never merge with each other before one of them merges with the admixed population. Furthermore, the distance on the constructed tree between the admixed population and either source population is always smaller than the distance between the source populations, and the external branch for the admixed population is always incident to the path connecting the sources. We define three properties that hold for four taxa and that we hypothesize are satisfied under more general conditions: antecedence of clustering, intermediacy of distances, and intermediacy of path lengths. Our findings can inform interpretations of neighbor-joining trees with admixed groups, and they provide an explanation for patterns observed in trees of human populations.
Collapse
Affiliation(s)
- NAAMA M. KOPELMAN
- Porter School of Environmental Studies, Department of Zoology, Tel Aviv University, Ramat Aviv, Israel
| | - LEWI STONE
- Porter School of Environmental Studies, Department of Zoology, Tel Aviv University, Ramat Aviv, Israel
| | - OLIVIER GASCUEL
- Méthodes et Algorithmes pour la Bioinformatique, LIRMM-CNRS, Montpellier, France
| | | |
Collapse
|