1
|
Song M, Zhou Y, Zhao C, Song F, Hou Y. YHP: Y-chromosome Haplogroup Predictor for predicting male lineages based on Y-STRs. Forensic Sci Int 2024; 361:112113. [PMID: 38936202 DOI: 10.1016/j.forsciint.2024.112113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/24/2024] [Accepted: 06/16/2024] [Indexed: 06/29/2024]
Abstract
Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has been used in forensic scenarios. However, existing software aids in familial searching using Y-STRs (Y-chromosome short tandem repeats) to predict Y-SNP (Y-chromosome single nucleotide polymorphism) haplogroups, they often lack resolution. In this study, we developed YHP (Y Haplogroup Predictor), a novel software offering high-resolution haplogroup inference without requiring extensive Y-SNP sequencing. Leveraging existing datasets (219 haplogroups, 4064 samples in total), YHP predicts haplogroups with 0.923 accuracy under the highest haplogroup resolution, employing a random forest algorithm. YHP, available on Github (https://github.com/cissy123/YHP-Y-Haplogroup-Predictor-), facilitates high-resolution haplogroup prediction, haplotype mismatch analysis, and haplotype similarity comparison. Notably, it demonstrates efficacy in East Asian populations, benefiting from training data from eight distinct East Asian ethnic populations. Moreover, it enables seamless integration of additional training sets, extending its utility to diverse populations.
Collapse
Affiliation(s)
- Mengyuan Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China; Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxiang Zhou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Chenxi Zhao
- College of Computer Science, Sichuan University, Chengdu, China
| | - Feng Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| | - Yiping Hou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
2
|
García-Olivares V, Muñoz-Barrera A, Rubio-Rodríguez LA, Jáspez D, Díaz-de Usera A, Iñigo-Campos A, Veeramah KR, Alonso S, Thomas MG, Lorenzo-Salazar JM, González-Montelongo R, Flores C. Benchmarking of human Y-chromosomal haplogroup classifiers with whole-genome and whole-exome sequence data. Comput Struct Biotechnol J 2023; 21:4613-4618. [PMID: 37817776 PMCID: PMC10560978 DOI: 10.1016/j.csbj.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 10/12/2023] Open
Abstract
In anthropological, medical, and forensic studies, the nonrecombinant region of the human Y chromosome (NRY) enables accurate reconstruction of pedigree relationships and retrieval of ancestral information. Using high-throughput sequencing (HTS) data, we present a benchmarking analysis of command-line tools for NRY haplogroup classification. The evaluation was performed using paired Illumina data from whole-genome sequencing (WGS) and whole-exome sequencing (WES) experiments from 50 unrelated donors. Additionally, as a validation, we also used paired WGS/WES datasets of 54 individuals from the 1000 Genomes Project. Finally, we evaluated the tools on data from third-generation HTS obtained from a subset of donors and one reference sample. Our results show that WES, despite typically offering less genealogical resolution than WGS, is an effective method for determining the NRY haplogroup. Y-LineageTracker and Yleaf showed the highest accuracy for WGS data, classifying precisely 98% and 96% of the samples, respectively. Yleaf outperforms all benchmarked tools in the WES data, classifying approximately 90% of the samples. Yleaf, Y-LineageTracker, and pathPhynder can correctly classify most samples (88%) sequenced with third-generation HTS. As a result, Yleaf provides the best performance for applications that use WGS and WES. Overall, our study offers researchers with a guide that allows them to select the most appropriate tool to analyze the NRY region using both second- and third-generation HTS data.
Collapse
Affiliation(s)
- Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Ana Díaz-de Usera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Antonio Iñigo-Campos
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Krishna R. Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, United States
| | - Santos Alonso
- Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country UPV/EHU, Leioa, Bizkaia, Spain
- María Goyri Building, Biotechnology Center, Human Molecular Evolution Lab 2.08 UPV/EHU Science Park, 48940 Leioa, Bizkaia, Spain
| | - Mark G. Thomas
- UCL Genetics Institute, University College London (UCL), Gower Street, London WC1E 6BT, United Kingdom
- Research Department of Genetics, Evolution & Environment, University College London (UCL), Darwin Building, Gower Street, London WC1E 6BT, United Kingdom
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Rafaela González-Montelongo
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
3
|
Y-SNP Haplogroup Hierarchy Finder: a web tool for Y-SNP haplogroup assignment. J Hum Genet 2022; 67:487-493. [DOI: 10.1038/s10038-022-01033-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/03/2022] [Accepted: 03/09/2022] [Indexed: 11/08/2022]
|
4
|
Zho Z, Zhou Y, Li Z, Yao Y, Yang Q, Qian J, Shao C, Qian X, Sun K, Tang Q, Xie J. Identification and assessment of a subset of Y-SNPs with recurrent mutation for forensic purpose. Forensic Sci Int 2022; 334:111270. [DOI: 10.1016/j.forsciint.2022.111270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 01/18/2022] [Accepted: 03/09/2022] [Indexed: 11/26/2022]
|
5
|
Jagadeesan A, Ebenesersdóttir SS, Guðmundsdóttir VB, Thordardottir EL, Moore KHS, Helgason A. HaploGrouper: a generalized approach to haplogroup classification. Bioinformatics 2021; 37:570-572. [PMID: 32805011 DOI: 10.1093/bioinformatics/btaa729] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 08/09/2020] [Accepted: 08/12/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION We introduce HaploGrouper, a versatile software to classify haplotypes into haplogroups on the basis of a known phylogenetic tree. A typical use case for this software is the assignment of haplogroups to human mitochondrial DNA (mtDNA) or Y-chromosome haplotypes. Existing state-of-the-art haplogroup-calling software is typically hard-wired to work only with either mtDNA or Y-chromosome haplotypes from humans. RESULTS HaploGrouper exhibits comparable accuracy in these instances and has the advantage of being able to assign haplogroups to any kind of haplotypes from any species-given an extant annotated phylogenetic tree defined by sequence variants. AVAILABILITY AND IMPLEMENTATION The software is available at the following URL https://gitlab.com/bio_anth_decode/haploGrouper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anuradha Jagadeesan
- deCODE Genetics/Amgen, Reykjavik 101, Iceland.,Department of Anthropology, University of Iceland, Reykjavik 101, Iceland
| | - S Sunna Ebenesersdóttir
- deCODE Genetics/Amgen, Reykjavik 101, Iceland.,Department of Anthropology, University of Iceland, Reykjavik 101, Iceland
| | - Valdis B Guðmundsdóttir
- deCODE Genetics/Amgen, Reykjavik 101, Iceland.,Department of Anthropology, University of Iceland, Reykjavik 101, Iceland
| | | | | | - Agnar Helgason
- deCODE Genetics/Amgen, Reykjavik 101, Iceland.,Department of Anthropology, University of Iceland, Reykjavik 101, Iceland
| |
Collapse
|
6
|
Sengupta D, Choudhury A, Fortes-Lima C, Aron S, Whitelaw G, Bostoen K, Gunnink H, Chousou-Polydouri N, Delius P, Tollman S, Gómez-Olivé FX, Norris S, Mashinya F, Alberts M, Hazelhurst S, Schlebusch CM, Ramsay M. Genetic substructure and complex demographic history of South African Bantu speakers. Nat Commun 2021; 12:2080. [PMID: 33828095 PMCID: PMC8027885 DOI: 10.1038/s41467-021-22207-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 02/10/2021] [Indexed: 02/01/2023] Open
Abstract
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Collapse
Affiliation(s)
- Dhriti Sengupta
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Ananyo Choudhury
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Cesar Fortes-Lima
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Shaun Aron
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Gavin Whitelaw
- KwaZulu-Natal Museum, Pietermaritzburg, South Africa
- School of Geography, Archaeology & Environmental Studies, University of the Witwatersrand, Johannesburg, South Africa
| | - Koen Bostoen
- UGent Centre for Bantu Studies, Department of Languages and Cultures, Ghent University, Ghent, Belgium
| | - Hilde Gunnink
- UGent Centre for Bantu Studies, Department of Languages and Cultures, Ghent University, Ghent, Belgium
| | - Natalia Chousou-Polydouri
- Department of Comparative Linguistic Science and Center for the Interdisciplinary Study of Language Evolution, University of Zürich, Zürich, Switzerland
| | - Peter Delius
- Department of History, University of the Witwatersrand, Johannesburg, South Africa
| | - Stephen Tollman
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - F Xavier Gómez-Olivé
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Shane Norris
- MRC/Wits Developmental Pathways for Health Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Felistas Mashinya
- Department of Pathology and Medical Sciences; School of Health Care Sciences, Faculty of Health Sciences, University of Limpopo, Polokwane, South Africa
| | - Marianne Alberts
- Department of Pathology and Medical Sciences; School of Health Care Sciences, Faculty of Health Sciences, University of Limpopo, Polokwane, South Africa
| | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Carina M Schlebusch
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Johannesburg, South Africa
| | - Michèle Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
| |
Collapse
|
7
|
Chen H, Lu Y, Lu D, Xu S. Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinformatics 2021; 22:114. [PMID: 33750289 PMCID: PMC7941695 DOI: 10.1186/s12859-021-04057-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 02/28/2021] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies. RESULTS We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect. CONCLUSIONS Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean ( https://codeocean.com/capsule/7424381/tree ).
Collapse
Affiliation(s)
- Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yan Lu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- School of Life Sciences, Fudan University, Shanghai, 200433, China
| | - Dongsheng Lu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shuhua Xu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
- Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, 450052, China.
- Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
8
|
Choudhury A, Aron S, Botigué LR, Sengupta D, Botha G, Bensellak T, Wells G, Kumuthini J, Shriner D, Fakim YJ, Ghoorah AW, Dareng E, Odia T, Falola O, Adebiyi E, Hazelhurst S, Mazandu G, Nyangiri OA, Mbiyavanga M, Benkahla A, Kassim SK, Mulder N, Adebamowo SN, Chimusa ER, Muzny D, Metcalf G, Gibbs RA, Rotimi C, Ramsay M, Adeyemo AA, Lombard Z, Hanchard NA. High-depth African genomes inform human migration and health. Nature 2020; 586:741-748. [PMID: 33116287 PMCID: PMC7759466 DOI: 10.1038/s41586-020-2859-7] [Citation(s) in RCA: 218] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 08/07/2020] [Indexed: 01/05/2023]
Abstract
The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals-comprising 50 ethnolinguistic groups, including previously unsampled populations-to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon-but in other genes, variants denoted as 'likely pathogenic' in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.
Collapse
Affiliation(s)
- Ananyo Choudhury
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Shaun Aron
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Laura R Botigué
- Center for Research in Agricultural Genomics (CRAG), Plant and Animal Genomics Program, CSIC-IRTA-UAB-UB, Barcelona, Spain
| | - Dhriti Sengupta
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Gerrit Botha
- Computational Biology Division and H3ABioNet, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Taoufik Bensellak
- System and Data Engineering Team, Abdelmalek Essaadi University, ENSA, Tangier, Morocco
| | - Gordon Wells
- Centre for Proteomic and Genomic Research (CPGR), Cape Town, South Africa.,South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.,Africa Health Research Institute, Durban, South Africa
| | - Judit Kumuthini
- Centre for Proteomic and Genomic Research (CPGR), Cape Town, South Africa.,South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | - Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yasmina J Fakim
- Department of Agriculture and Food Science, Faculty of Agriculture, University of Mauritius, Reduit, Mauritius.,Department of Digital Technologies,Faculty of Information, Communication & Digital Technologies, University of Mauritius, Reduit, Mauritius
| | - Anisah W Ghoorah
- Department of Digital Technologies,Faculty of Information, Communication & Digital Technologies, University of Mauritius, Reduit, Mauritius
| | - Eileen Dareng
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Institute of Human Virology Nigeria, Abuja, Nigeria
| | - Trust Odia
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Oluwadamilare Falola
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria.,Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
| | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Gaston Mazandu
- Computational Biology Division and H3ABioNet, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Oscar A Nyangiri
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Mamana Mbiyavanga
- Computational Biology Division and H3ABioNet, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (BIMS), Institute Pasteur of Tunis, Tunis, Tunisia
| | - Samar K Kassim
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Abbaseya, Cairo, Egypt
| | - Nicola Mulder
- Computational Biology Division and H3ABioNet, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Sally N Adebamowo
- Department of Epidemiology and Public Health, University of Maryland School of Medicine, University of Maryland Baltimore, Baltimore, MD, USA.,University of Maryland Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, University of Maryland Baltimore, Baltimore, MD, USA
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute for Infectious, Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Charles Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michèle Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Division of Human Genetics, National Health Laboratory Service, and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Adebowale A Adeyemo
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Zané Lombard
- Division of Human Genetics, National Health Laboratory Service, and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
| | - Neil A Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
9
|
Naidoo T, Xu J, Vicente M, Malmström H, Soodyall H, Jakobsson M, Schlebusch CM. Y-Chromosome Variation in Southern African Khoe-San Populations Based on Whole-Genome Sequences. Genome Biol Evol 2020; 12:1031-1039. [PMID: 32697300 PMCID: PMC7375190 DOI: 10.1093/gbe/evaa098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2020] [Indexed: 12/30/2022] Open
Abstract
Although the human Y chromosome has effectively shown utility in uncovering facets of human evolution and population histories, the ascertainment bias present in early Y-chromosome variant data sets limited the accuracy of diversity and TMRCA estimates obtained from them. The advent of next-generation sequencing, however, has removed this bias and allowed for the discovery of thousands of new variants for use in improving the Y-chromosome phylogeny and computing estimates that are more accurate. Here, we describe the high-coverage sequencing of the whole Y chromosome in a data set of 19 male Khoe-San individuals in comparison with existing whole Y-chromosome sequence data. Due to the increased resolution, we potentially resolve the source of haplogroup B-P70 in the Khoe-San, and reconcile recently published haplogroup A-M51 data with the most recent version of the ISOGG Y-chromosome phylogeny. Our results also improve the positioning of tentatively placed new branches of the ISOGG Y-chromosome phylogeny. The distribution of major Y-chromosome haplogroups in the Khoe-San and other African groups coincide with the emerging picture of African demographic history; with E-M2 linked to the agriculturalist Bantu expansion, E-M35 linked to pastoralist eastern African migrations, B-M112 linked to earlier east-south gene flow, A-M14 linked to shared ancestry with central African rainforest hunter-gatherers, and A-M51 potentially unique to the Khoe-San.
Collapse
Affiliation(s)
- Thijessen Naidoo
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Sweden
- Science for Life Laboratory, Uppsala, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
| | - Jingzi Xu
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Mário Vicente
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Helena Malmström
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
- Palaeo-Research Institute, University of Johannesburg, Auckland Park, South Africa
| | - Himla Soodyall
- Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- National Health Laboratory Service, Johannesburg, South Africa
- Academy of Science of South Africa
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
- Science for Life Laboratory, Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Auckland Park, South Africa
| | - Carina M Schlebusch
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Sweden
- Science for Life Laboratory, Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Auckland Park, South Africa
| |
Collapse
|
10
|
AlSafar HS, Al-Ali M, Elbait GD, Al-Maini MH, Ruta D, Peramo B, Henschel A, Tay GK. Introducing the first whole genomes of nationals from the United Arab Emirates. Sci Rep 2019; 9:14725. [PMID: 31604968 PMCID: PMC6789106 DOI: 10.1038/s41598-019-50876-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 09/20/2019] [Indexed: 12/30/2022] Open
Abstract
Whole Genome Sequencing (WGS) provides an in depth description of genome variation. In the era of large-scale population genome projects, the assembly of ethnic-specific genomes combined with mapping human reference genomes of underrepresented populations has improved the understanding of human diversity and disease associations. In this study, for the first time, whole genome sequences of two nationals of the United Arab Emirates (UAE) at >27X coverage are reported. The two Emirati individuals were predominantly of Central/South Asian ancestry. An in-house customized pipeline using BWA, Picard followed by the GATK tools to map the raw data from whole genome sequences of both individuals was used. A total of 3,994,521 variants (3,350,574 Single Nucleotide Polymorphisms (SNPs) and 643,947 indels) were identified for the first individual, the UAE S001 sample. A similar number of variants, 4,031,580 (3,373,501 SNPs and 658,079 indels), were identified for UAE S002. Variants that are associated with diabetes, hypertension, increased cholesterol levels, and obesity were also identified in these individuals. These Whole Genome Sequences has provided a starting point for constructing a UAE reference panel which will lead to improvements in the delivery of precision medicine, quality of life for affected individuals and a reduction in healthcare costs. The information compiled will likely lead to the identification of target genes that could potentially lead to the development of novel therapeutic modalities.
Collapse
Affiliation(s)
- Habiba S AlSafar
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Mariam Al-Ali
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Gihan Daw Elbait
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | | | - Dymitr Ruta
- Etisalat-British Telecom Innovation Center, Abu Dhabi, United Arab Emirates
| | | | - Andreas Henschel
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Computer Science, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Guan K Tay
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,School of Psychiatry and Clinical Neurosciences, University of Western Australia, Nedlands, Australia. .,School of Medical and Health Sciences, Edith Cowan University, Joondalup, Australia.
| |
Collapse
|
11
|
Lorente-Galdos B, Lao O, Serra-Vidal G, Santpere G, Kuderna LFK, Arauna LR, Fadhlaoui-Zid K, Pimenoff VN, Soodyall H, Zalloua P, Marques-Bonet T, Comas D. Whole-genome sequence analysis of a Pan African set of samples reveals archaic gene flow from an extinct basal population of modern humans into sub-Saharan populations. Genome Biol 2019; 20:77. [PMID: 31023378 PMCID: PMC6485163 DOI: 10.1186/s13059-019-1684-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 03/28/2019] [Indexed: 12/30/2022] Open
Abstract
Background Population demography and gene flow among African groups, as well as the putative archaic introgression of ancient hominins, have been poorly explored at the genome level. Results Here, we examine 15 African populations covering all major continental linguistic groups, ecosystems, and lifestyles within Africa through analysis of whole-genome sequence data of 21 individuals sequenced at deep coverage. We observe a remarkable correlation among genetic diversity and geographic distance, with the hunter-gatherer groups being more genetically differentiated and having larger effective population sizes throughout most modern-human history. Admixture signals are found between neighbor populations from both hunter-gatherer and agriculturalists groups, whereas North African individuals are closely related to Eurasian populations. Regarding archaic gene flow, we test six complex demographic models that consider recent admixture as well as archaic introgression. We identify the fingerprint of an archaic introgression event in the sub-Saharan populations included in the models (~ 4.0% in Khoisan, ~ 4.3% in Mbuti Pygmies, and ~ 5.8% in Mandenka) from an early divergent and currently extinct ghost modern human lineage. Conclusion The present study represents an in-depth genomic analysis of a Pan African set of individuals, which emphasizes their complex relationships and demographic history at population level. Electronic supplementary material The online version of this article (10.1186/s13059-019-1684-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Belen Lorente-Galdos
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Oscar Lao
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Gerard Serra-Vidal
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Gabriel Santpere
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Lukas F K Kuderna
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Lara R Arauna
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Karima Fadhlaoui-Zid
- College of Science, Department of Biology, Taibah University, Al Madinah, Al Monawarah, Saudi Arabia.,Higher Institute of Biotechnology of Beja, University of Jendouba, Avenue Habib Bourguiba, BP, 382, 9000, Beja, Tunisia
| | - Ville N Pimenoff
- Oncology Data Analytics Program, Bellvitge Biomedical Research Institute (ICO-IDIBELL), Consortium for Biomedical Research in Epidemiology and Public Health, Hospitalet de Llobregat, Barcelona, Spain.,Department of Archaeology, University of Helsinki, Helsinki, Finland
| | - Himla Soodyall
- Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
| | - Pierre Zalloua
- School of Medicine, The Lebanese American University, Beirut, 1102-2801, Lebanon
| | - Tomas Marques-Bonet
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats, ICREA, 08003, Barcelona, Spain
| | - David Comas
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.
| |
Collapse
|
12
|
Ralf A, Montiel González D, Zhong K, Kayser M. Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data. Mol Biol Evol 2019. [PMID: 29518227 DOI: 10.1093/molbev/msy032] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.
Collapse
Affiliation(s)
- Arwin Ralf
- Department of Genetic Identification, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands
| | - Diego Montiel González
- Department of Genetic Identification, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands
| | - Kaiyin Zhong
- Department of Genetic Identification, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands
| | - Manfred Kayser
- Department of Genetic Identification, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands
| |
Collapse
|
13
|
Bai H, Guo X, Narisu N, Lan T, Wu Q, Xing Y, Zhang Y, Bond SR, Pei Z, Zhang Y, Zhang D, Jirimutu J, Zhang D, Yang X, Morigenbatu M, Zhang L, Ding B, Guan B, Cao J, Lu H, Liu Y, Li W, Dang N, Jiang M, Wang S, Xu H, Wang D, Liu C, Luo X, Gao Y, Li X, Wu Z, Yang L, Meng F, Ning X, Hashenqimuge H, Wu K, Wang B, Suyalatu S, Liu Y, Ye C, Wu H, Leppälä K, Li L, Fang L, Chen Y, Xu W, Li T, Liu X, Xu X, Gignoux CR, Yang H, Brody LC, Wang J, Kristiansen K, Burenbatu B, Zhou H, Yin Y. Whole-genome sequencing of 175 Mongolians uncovers population-specific genetic architecture and gene flow throughout North and East Asia. Nat Genet 2018; 50:1696-1704. [PMID: 30397334 DOI: 10.1038/s41588-018-0250-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 09/03/2018] [Indexed: 12/30/2022]
Abstract
The genetic variation in Northern Asian populations is currently undersampled. To address this, we generated a new genetic variation reference panel by whole-genome sequencing of 175 ethnic Mongolians, representing six tribes. The cataloged variation in the panel shows strong population stratification among these tribes, which correlates with the diverse demographic histories in the region. Incorporating our results with the 1000 Genomes Project panel identifies derived alleles shared between Finns and Mongolians/Siberians, suggesting that substantial gene flow between northern Eurasian populations has occurred in the past. Furthermore, we highlight that North, East, and Southeast Asian populations are more aligned with each other than these groups are with South Asian and Oceanian populations.
Collapse
Affiliation(s)
- Haihua Bai
- School of Life Science, Inner Mongolia University for the Nationalities, Tongliao, China.,Inner Mongolia Engineering Research Center of Personalized Medicine, Tongliao, China
| | - Xiaosen Guo
- BGI-Shenzhen, Shenzhen, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Narisu Narisu
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Tianming Lan
- BGI-Shenzhen, Shenzhen, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Qizhu Wu
- Affiliated Hospital of Inner Mongolia University for the Nationalities, Tongliao, China
| | - Yanping Xing
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Stephen R Bond
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhili Pei
- College of Computer Science and Technology, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Yanru Zhang
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Dandan Zhang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Jirimutu Jirimutu
- College of Mathematics, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Dong Zhang
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xukui Yang
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Morigenbatu Morigenbatu
- College of Mongolian Studies, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Li Zhang
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Bingyi Ding
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Baozhu Guan
- Inner Mongolia International Mongolian Hospital, Hohhot, China
| | - Junwei Cao
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Haorong Lu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China.,Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, China
| | - Yiyi Liu
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Wangsheng Li
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Ningxin Dang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Mingyang Jiang
- College of Computer Science and Technology, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Shenyuan Wang
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Huixin Xu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Dingzhu Wang
- College of Mongolian Studies, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Chunxia Liu
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xin Luo
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Ying Gao
- School of Life Science, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Xueqiong Li
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Zongze Wu
- Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark.,BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Liqing Yang
- Affiliated Hospital of Inner Mongolia University for the Nationalities, Tongliao, China
| | - Fanhua Meng
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xiaolian Ning
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | | | - Kaifeng Wu
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Bo Wang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Suyalatu Suyalatu
- School of Life Science, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Yingchun Liu
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Chen Ye
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Huiguang Wu
- School of Life Science, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Kalle Leppälä
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Lu Li
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Lin Fang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Yujie Chen
- School of Life Science, Inner Mongolia University for the Nationalities, Tongliao, China
| | - Wenhao Xu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China.,College of Life Science and Technology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, China
| | - Tao Li
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China.,James D. Watson Institute of Genome Sciences, Hangzhou, China
| | - Lawrence C Brody
- Gene and Environment Interaction Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jun Wang
- BGI-Shenzhen, Shenzhen, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Karsten Kristiansen
- BGI-Shenzhen, Shenzhen, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Burenbatu Burenbatu
- Affiliated Hospital of Inner Mongolia University for the Nationalities, Tongliao, China.
| | - Huanmin Zhou
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, China.
| | - Ye Yin
- Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark. .,BGI Genomics, BGI-Shenzhen, Shenzhen, China. .,School of Life Science and Biotechnology, Dalian University of Technology, Dalian, China.
| |
Collapse
|
14
|
Claerhout S, Vandenbosch M, Nivelle K, Gruyters L, Peeters A, Larmuseau MH, Decorte R. Determining Y-STR mutation rates in deep-routing genealogies: Identification of haplogroup differences. Forensic Sci Int Genet 2018; 34:1-10. [DOI: 10.1016/j.fsigen.2018.01.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/10/2018] [Accepted: 01/14/2018] [Indexed: 10/18/2022]
|
15
|
Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans. Nat Commun 2017; 8:2062. [PMID: 29233967 PMCID: PMC5727231 DOI: 10.1038/s41467-017-00663-9] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 07/17/2017] [Indexed: 11/08/2022] Open
Abstract
The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10−6) differentiation, and FST analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease. African populations show a high level of genetic diversity and extensive regional admixture. Here, the authors sequence the whole genomes of 24 South African individuals of different ethnolinguistic origin and find substantive genomic divergence between two southeastern Bantu-speaking groups.
Collapse
|
16
|
Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81). Sci Rep 2017; 7:15941. [PMID: 29162904 PMCID: PMC5698413 DOI: 10.1038/s41598-017-16271-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/09/2017] [Indexed: 12/30/2022] Open
Abstract
E-M183 (E-M81) is the most frequent paternal lineage in North Africa and thus it must be considered to explore past historical and demographical processes. Here, by using whole Y chromosome sequences from 32 North African individuals, we have identified five new branches within E-M183. The validation of these variants in more than 200 North African samples, from which we also have information of 13 Y-STRs, has revealed a strong resemblance among E-M183 Y-STR haplotypes that pointed to a rapid expansion of this haplogroup. Moreover, for the first time, by using both SNP and STR data, we have provided updated estimates of the times-to-the-most-recent-common-ancestor (TMRCA) for E-M183, which evidenced an extremely recent origin of this haplogroup (2,000-3,000 ya). Our results also showed a lack of population structure within the E-M183 branch, which could be explained by the recent and rapid expansion of this haplogroup. In spite of a reduction in STR heterozygosity towards the West, which would point to an origin in the Near East, ancient DNA evidence together with our TMRCA estimates point to a local origin of E-M183 in NW Africa.
Collapse
|
17
|
Defining Y-SNP variation among the Flemish population (Western Europe) by full genome sequencing. Forensic Sci Int Genet 2017; 31:e12-e16. [PMID: 29089250 DOI: 10.1016/j.fsigen.2017.10.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 10/10/2017] [Accepted: 10/24/2017] [Indexed: 12/27/2022]
Abstract
Y-chromosomal single nucleotide polymorphisms (Y-SNPs) represent a powerful tool in forensic research and casework, especially for inferring paternal ancestry of unknown perpetrators and unidentified bodies. However, the wealth of recently discovered Y-SNPs, the 'jungle' of different evolutionary lineage trees and nomenclatures, and the lack of population-wide data of many phylogenetically mapped Y-SNPs, limits the use of Y-SNPs in routine forensic approaches. Recently, a concise reference phylogeny of the human Y chromosome, the 'Minimal Reference Y-tree', was introduced aiming to provide a stable phylogeny with optimal global discrimination capacity by including the most resolving Y-SNPs. Here, we obtained a representative sample of 270 whole-genome sequences (WGS) to grasp the Y-SNP variation within the autochthonous Flemish population (Belgium, Western Europe) according to this reference Y-tree. The high quality of the Y-SNP calling was guaranteed for the WGS sample as well as its representativeness for the Flemish population based on the comparison of the main haplogroup frequencies with those from earlier studies on Flanders and the Netherlands. The 270 Flemish Y chromosomes were assigned to 98 different sub-haplogroups of the Minimal Reference Y-tree, showing its high potential of discrimination and confirming the spectrum of evolutionary lineages within Western Europe in general and within Flanders in particular. The full database with all Y-SNP calls of the Flemish sample is public available for future updates including forensic and population genetic studies. New initiatives to categorise Y-SNP variation in other populations according to the reference phylogeny of the Y chromosome are highly encouraged for forensic applications. Recommendations to realise such future population sample sets are discussed based on this study.
Collapse
|
18
|
Alvarez-Cubero MJ, Saiz M, Martínez-García B, Sayalero SM, Entrala C, Lorente JA, Martinez-Gonzalez LJ. Next generation sequencing: an application in forensic sciences? Ann Hum Biol 2017; 44:581-592. [PMID: 28948844 DOI: 10.1080/03014460.2017.1375155] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
CONTEXT Over the last few decades, advances in sequencing have improved greatly. One of the most important achievements of Next Generation Sequencing (NGS) is to produce millions of sequence reads in a short period of time, and to produce large sequences of DNA in fragments of any size. Libraries can be generated from whole genomes or any DNA or RNA region of interest without the need to know its sequence beforehand. This allows for looking for variations and facilitating genetic identification. OBJECTIVES A deep analysis of current NGS technologies and their application, especially in forensics, including a discussion about the pros and cons of these technologies in genetic identification. METHODS A systematic literature search in PubMed, Science Direct and Scopus electronic databases was performed for the period of December 2012 to June 2015. RESULTS In the forensic field, one of the main problems is the limited amount of sample available, as well as its degraded state. If the amount of DNA input required for preparing NGS libraries continues to decrease, nearly any sample could be sequenced; therefore, the maximum information from any biological remains could be obtained. Additionally, microbiome typification could be an interesting application to study for crime scene characterisation. CONCLUSIONS NGS technologies are going to be crucial for DNA human typing in cases like mass disasters or other events where forensic specimens and samples are compromised and degraded. With the use of NGS it will be possible to achieve the simultaneous analysis of the standard autosomal DNA (STRs and SNPs), mitochondrial DNA, and X and Y chromosomal markers.
Collapse
Affiliation(s)
- Maria Jesus Alvarez-Cubero
- a GENYO , Centro Pfizer-Universidad de Granada-Junta de Andalucía de Genómica e Investigación Oncológica, Parque Tecnológico de Ciencias de la Salud (PTS) , Granada , España
| | - Maria Saiz
- b Laboratorio de Identificación Genética, Departamento de Medicina Legal, Toxicología y Antropología Física, Facultad de Medicina , Universidad de Granada , Granada , España
| | - Belén Martínez-García
- a GENYO , Centro Pfizer-Universidad de Granada-Junta de Andalucía de Genómica e Investigación Oncológica, Parque Tecnológico de Ciencias de la Salud (PTS) , Granada , España
| | - Sara M Sayalero
- c CRAG - Centre de Recerca en Agrigenòmica - CSIC IRTA UAB UB , Barcelona , España
| | - Carmen Entrala
- d LORGEN G.P. , PT, Ciencias de la Salud - BIC , Granada , España
| | - Jose Antonio Lorente
- a GENYO , Centro Pfizer-Universidad de Granada-Junta de Andalucía de Genómica e Investigación Oncológica, Parque Tecnológico de Ciencias de la Salud (PTS) , Granada , España.,b Laboratorio de Identificación Genética, Departamento de Medicina Legal, Toxicología y Antropología Física, Facultad de Medicina , Universidad de Granada , Granada , España
| | - Luis Javier Martinez-Gonzalez
- a GENYO , Centro Pfizer-Universidad de Granada-Junta de Andalucía de Genómica e Investigación Oncológica, Parque Tecnológico de Ciencias de la Salud (PTS) , Granada , España
| |
Collapse
|
19
|
Qian X, Hou J, Wang Z, Ye Y, Lang M, Gao T, Liu J, Hou Y. Next Generation Sequencing Plus (NGS+) with Y-chromosomal Markers for Forensic Pedigree Searches. Sci Rep 2017; 7:11324. [PMID: 28900279 PMCID: PMC5595879 DOI: 10.1038/s41598-017-11955-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 09/01/2017] [Indexed: 11/17/2022] Open
Abstract
There is high demand for forensic pedigree searches with Y-chromosome short tandem repeat (Y-STR) profiling in large-scale crime investigations. However, when two Y-STR haplotypes have a few mismatched loci, it is difficult to determine if they are from the same male lineage because of the high mutation rate of Y-STRs. Here we design a new strategy to handle cases in which none of pedigree samples shares identical Y-STR haplotype. We combine next generation sequencing (NGS), capillary electrophoresis and pyrosequencing under the term ‘NGS+’ for typing Y-STRs and Y-chromosomal single nucleotide polymorphisms (Y-SNPs). The high-resolution Y-SNP haplogroup and Y-STR haplotype can be obtained with NGS+. We further developed a new data-driven decision rule, FSindex, for estimating the likelihood for each retrieved pedigree. Our approach enables positive identification of pedigree from mismatched Y-STR haplotypes. It is envisaged that NGS+ will revolutionize forensic pedigree searches, especially when the person of interest was not recorded in forensic DNA database.
Collapse
Affiliation(s)
- Xiaoqin Qian
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Jiayi Hou
- Clinical and Translational Research Institute, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Zheng Wang
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Yi Ye
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Min Lang
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Tianzhen Gao
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Jing Liu
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China
| | - Yiping Hou
- Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
20
|
Wang CC, Huang Y, Yu X, Chen C, Jin L, Li H. Agriculture driving male expansion in Neolithic Time. SCIENCE CHINA-LIFE SCIENCES 2016; 59:643-6. [PMID: 27132019 DOI: 10.1007/s11427-016-5057-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 02/03/2016] [Indexed: 11/26/2022]
Affiliation(s)
- Chuan-Chao Wang
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Yunzhi Huang
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Xue'er Yu
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Chun Chen
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Li Jin
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Hui Li
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
21
|
The Paternal Landscape along the Bight of Benin - Testing Regional Representativeness of West-African Population Samples Using Y-Chromosomal Markers. PLoS One 2015; 10:e0141510. [PMID: 26544036 PMCID: PMC4636292 DOI: 10.1371/journal.pone.0141510] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 10/08/2015] [Indexed: 11/19/2022] Open
Abstract
Patterns of genetic variation in human populations across the African continent are still not well studied in comparison with Eurasia and America, despite the high genetic and cultural diversity among African populations. In population and forensic genetic studies a single sample is often used to represent a complete African region. In such a scenario, inappropriate sampling strategies and/or the use of local, isolated populations may bias interpretations and pose questions of representativeness at a macrogeographic-scale. The non-recombining region of the Y-chromosome (NRY) has great potential to reveal the regional representation of a sample due to its powerful phylogeographic information content. An area poorly characterized for Y-chromosomal data is the West-African region along the Bight of Benin, despite its important history in the trans-Atlantic slave trade and its large number of ethnic groups, languages and lifestyles. In this study, Y-chromosomal haplotypes from four Beninese populations were determined and a global meta-analysis with available Y-SNP and Y-STR data from populations along the Bight of Benin and surrounding areas was performed. A thorough methodology was developed allowing comparison of population samples using Y-chromosomal lineage data based on different Y-SNP panels and phylogenies. Geographic proximity turned out to be the best predictor of genetic affinity between populations along the Bight of Benin. Nevertheless, based on Y-chromosomal data from the literature two population samples differed strongly from others from the same or neighbouring areas and are not regionally representative within large-scale studies. Furthermore, the analysis of the HapMap sample YRI of a Yoruban population from South-western Nigeria based on Y-SNPs and Y-STR data showed for the first time its regional representativeness, a result which is important for standard population and forensic genetic applications using the YRI sample. Therefore, the uniquely and powerful geographical information carried by the Y-chromosome makes it an important locus to test the representativeness of a certain sample even in the genomic era, especially in poorly investigated areas like Africa.
Collapse
|
22
|
Johansson MM, Van Geystelen A, Larmuseau MHD, Djurovic S, Andreassen OA, Agartz I, Jazin E. Microarray Analysis of Copy Number Variants on the Human Y Chromosome Reveals Novel and Frequent Duplications Overrepresented in Specific Haplogroups. PLoS One 2015; 10:e0137223. [PMID: 26322892 PMCID: PMC4554990 DOI: 10.1371/journal.pone.0137223] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 08/13/2015] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The human Y chromosome is almost always excluded from genome-wide investigations of copy number variants (CNVs) due to its highly repetitive structure. This chromosome should not be forgotten, not only for its well-known relevance in male fertility, but also for its involvement in clinical phenotypes such as cancers, heart failure and sex specific effects on brain and behaviour. RESULTS We analysed Y chromosome data from Affymetrix 6.0 SNP arrays and found that the signal intensities for most of 8179 SNP/CN probes in the male specific region (MSY) discriminated between a male, background signals in a female and an isodicentric male containing a large deletion of the q-arm and a duplication of the p-arm of the Y chromosome. Therefore, this SNP/CN platform is suitable for identification of gain and loss of Y chromosome sequences. In a set of 1718 males, we found 25 different CNV patterns, many of which are novel. We confirmed some of these variants by PCR or qPCR. The total frequency of individuals with CNVs was 14.7%, including 9.5% with duplications, 4.5% with deletions and 0.7% exhibiting both. Hence, a novel observation is that the frequency of duplications was more than twice the frequency of deletions. Another striking result was that 10 of the 25 detected variants were significantly overrepresented in one or more haplogroups, demonstrating the importance to control for haplogroups in genome-wide investigations to avoid stratification. NO-M214(xM175) individuals presented the highest percentage (95%) of CNVs. If they were not counted, 12.4% of the rest included CNVs, and the difference between duplications (8.9%) and deletions (2.8%) was even larger. CONCLUSIONS Our results demonstrate that currently available genome-wide SNP platforms can be used to identify duplications and deletions in the human Y chromosome. Future association studies of the full spectrum of Y chromosome variants will demonstrate the potential involvement of gain or loss of Y chromosome sequence in different human phenotypes.
Collapse
Affiliation(s)
- Martin M. Johansson
- Department of Organismal Biology, EBC, Uppsala University, Uppsala, Sweden
- * E-mail: (MMJ); (EJ)
| | - Anneleen Van Geystelen
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - Maarten H. D. Larmuseau
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
- Forensic Biomedical Sciences, Department of Imaging and Pathology, KU Leuven, Leuven, Belgium
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
- NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ole A. Andreassen
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Ingrid Agartz
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Psychiatric Research, Diakonhjemmet Hospital, Oslo, Norway
| | - Elena Jazin
- Department of Organismal Biology, EBC, Uppsala University, Uppsala, Sweden
- * E-mail: (MMJ); (EJ)
| |
Collapse
|
23
|
Kehdy FSG, Gouveia MH, Machado M, Magalhães WCS, Horimoto AR, Horta BL, Moreira RG, Leal TP, Scliar MO, Soares-Souza GB, Rodrigues-Soares F, Araújo GS, Zamudio R, Sant Anna HP, Santos HC, Duarte NE, Fiaccone RL, Figueiredo CA, Silva TM, Costa GNO, Beleza S, Berg DE, Cabrera L, Debortoli G, Duarte D, Ghirotto S, Gilman RH, Gonçalves VF, Marrero AR, Muniz YC, Weissensteiner H, Yeager M, Rodrigues LC, Barreto ML, Lima-Costa MF, Pereira AC, Rodrigues MR, Tarazona-Santos E. Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations. Proc Natl Acad Sci U S A 2015; 112:8696-701. [PMID: 26124090 PMCID: PMC4507185 DOI: 10.1073/pnas.1504447112] [Citation(s) in RCA: 189] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.
Collapse
Affiliation(s)
- Fernanda S G Kehdy
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Mateus H Gouveia
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Moara Machado
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner C S Magalhães
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Andrea R Horimoto
- Instituto do Coração, Universidade de São Paulo, 05403-900, São Paulo, Sao Paulo, Brazil
| | - Bernardo L Horta
- Programa de Pós-Graduação em Epidemiologia, Universidade Federal de Pelotas, 464, 96001-970 Pelotas, Rio Grande do Sul, Brazil
| | - Rennan G Moreira
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Thiago P Leal
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Marilia O Scliar
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Giordano B Soares-Souza
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Fernanda Rodrigues-Soares
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Gilderlanio S Araújo
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Roxana Zamudio
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Hanaisa P Sant Anna
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Hadassa C Santos
- Instituto do Coração, Universidade de São Paulo, 05403-900, São Paulo, Sao Paulo, Brazil
| | - Nubia E Duarte
- Instituto do Coração, Universidade de São Paulo, 05403-900, São Paulo, Sao Paulo, Brazil
| | - Rosemeire L Fiaccone
- Departamento de Estatística, Instituto de Matemática, Universidade Federal da Bahia, 40170-110, Salvador, Bahia, Brazil
| | - Camila A Figueiredo
- Departamento de Ciências da Biointeração, Instituto de Ciências da Saúde, Universidade Federal da Bahia, 40110-100, Salvador, Bahia, Brazil
| | - Thiago M Silva
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, 40110-040, Salvador, Bahia, Brazil
| | - Gustavo N O Costa
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, 40110-040, Salvador, Bahia, Brazil
| | - Sandra Beleza
- Department of Genetics, University of Leicester, LE1 7RH, Leicester, United Kingdom
| | - Douglas E Berg
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110; Department of Medicine, University of California, San Diego, CA 92093
| | - Lilia Cabrera
- Biomedical Research Unit, Asociación Benéfica Proyectos en Informática, Salud, Medicina y Agricultura (AB PRISMA), 170070, Lima, Peru
| | - Guilherme Debortoli
- Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, 88040-900, Florianópolis, Santa Catarina, Brazil
| | - Denise Duarte
- Departamento de Estatística, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Silvia Ghirotto
- Dipartimento di Scienze della Vita e Biotecnologie, Università di Ferrara, 44121 Ferrara, Italy
| | - Robert H Gilman
- Bloomberg School of Public Health, International Health, Johns Hopkins University, Baltimore, MD 21205; Laboratorio de Investigación de Enfermedades Infecciosas, Universidade Peruana Cayetano Heredia, 15102, Lima, Peru
| | - Vanessa F Gonçalves
- Department of Psychiatry and Neuroscience Section, Center for Addiction and Mental Health, University of Toronto, Toronto, ON, Canada M5T 1R8
| | - Andrea R Marrero
- Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, 88040-900, Florianópolis, Santa Catarina, Brazil
| | - Yara C Muniz
- Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, 88040-900, Florianópolis, Santa Catarina, Brazil
| | - Hansi Weissensteiner
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, 6020 Innsbruck, Austria
| | - Meredith Yeager
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 20850
| | - Laura C Rodrigues
- Department of Infectious Disease Epidemiology, Faculty of Epidemiology, London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Mauricio L Barreto
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, 40110-040, Salvador, Bahia, Brazil
| | - M Fernanda Lima-Costa
- Instituto de Pesquisa Rene Rachou, Fundação Oswaldo Cruz, 30190-002, Belo Horizonte, Minas Gerais, Brazil
| | - Alexandre C Pereira
- Instituto do Coração, Universidade de São Paulo, 05403-900, São Paulo, Sao Paulo, Brazil
| | - Maíra R Rodrigues
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, Minas Gerais, Brazil;
| |
Collapse
|
24
|
Large-scale recent expansion of European patrilineages shown by population resequencing. Nat Commun 2015; 6:7152. [PMID: 25988751 PMCID: PMC4441248 DOI: 10.1038/ncomms8152] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Accepted: 04/13/2015] [Indexed: 12/12/2022] Open
Abstract
The proportion of Europeans descending from Neolithic farmers ∼ 10 thousand years ago (KYA) or Palaeolithic hunter-gatherers has been much debated. The male-specific region of the Y chromosome (MSY) has been widely applied to this question, but unbiased estimates of diversity and time depth have been lacking. Here we show that European patrilineages underwent a recent continent-wide expansion. Resequencing of 3.7 Mb of MSY DNA in 334 males, comprising 17 European and Middle Eastern populations, defines a phylogeny containing 5,996 single-nucleotide polymorphisms. Dating indicates that three major lineages (I1, R1a and R1b), accounting for 64% of our sample, have very recent coalescent times, ranging between 3.5 and 7.3 KYA. A continuous swathe of 13/17 populations share similar histories featuring a demographic expansion starting ∼ 2.1-4.2 KYA. Our results are compatible with ancient MSY DNA data, and contrast with data on mitochondrial DNA, indicating a widespread male-specific phenomenon that focuses interest on the social structure of Bronze Age Europe.
Collapse
|
25
|
John SE, Thareja G, Hebbar P, Behbehani K, Thanaraj TA, Alsmadi O. Kuwaiti population subgroup of nomadic Bedouin ancestry-Whole genome sequence and analysis. GENOMICS DATA 2015; 3:116-27. [PMID: 26484159 PMCID: PMC4535864 DOI: 10.1016/j.gdata.2014.11.016] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Revised: 11/27/2014] [Accepted: 11/28/2014] [Indexed: 12/21/2022]
Abstract
Kuwaiti native population comprises three distinct genetic subgroups of Persian, "city-dwelling" Saudi Arabian tribe, and nomadic "tent-dwelling" Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious 'novel' variants lie in genes associated with autosomal recessive disorders characteristic of the region.
Collapse
Affiliation(s)
| | | | | | | | | | - Osama Alsmadi
- Corresponding author. Tel.: + 965 2224 2999x4343(work); fax: + 965 2249 2406.
| |
Collapse
|
26
|
Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry. BMC Genomics 2015; 16:92. [PMID: 25765185 PMCID: PMC4336699 DOI: 10.1186/s12864-015-1233-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 01/12/2015] [Indexed: 12/30/2022] Open
Abstract
Background The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage. Results We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays. Conclusions We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.
Collapse
|
27
|
Hallast P, Batini C, Zadik D, Maisano Delser P, Wetton JH, Arroyo-Pardo E, Cavalleri GL, de Knijff P, Destro Bisol G, Dupuy BM, Eriksen HA, Jorde LB, King TE, Larmuseau MH, López de Munain A, López-Parra AM, Loutradis A, Milasin J, Novelletto A, Pamjav H, Sajantila A, Schempp W, Sears M, Tolun A, Tyler-Smith C, Van Geystelen A, Watkins S, Winney B, Jobling MA. The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades. Mol Biol Evol 2014; 32:661-73. [PMID: 25468874 PMCID: PMC4327154 DOI: 10.1093/molbev/msu327] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.
Collapse
Affiliation(s)
- Pille Hallast
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Chiara Batini
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Daniel Zadik
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | | | - Jon H Wetton
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Eduardo Arroyo-Pardo
- Laboratory of Forensic and Population Genetics, Department of Toxicology and Health Legislation, Faculty of Medicine, Complutense University, Madrid, Spain
| | - Gianpiero L Cavalleri
- Molecular and Cellular Therapeutics, The Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Peter de Knijff
- Department of Human Genetics, Leiden University Medical Centre, Leiden, The Netherlands
| | - Giovanni Destro Bisol
- Istituto Italiano di Antropologia, Rome, Italy Department of Environmental Biology, Sapienza University of Rome, Rome, Italy
| | - Berit Myhre Dupuy
- Division of Forensic Sciences, Norwegian Institute of Public Health, Oslo, Norway
| | - Heidi A Eriksen
- Centre of Arctic Medicine, Thule Institute, University of Oulu, Oulu, Finland Utsjoki Health Care Centre, Utsjoki, Finland
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, UT
| | - Turi E King
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Maarten H Larmuseau
- Laboratory of Forensic Genetics and Molecular Archaeology, KU Leuven, Leuven, Belgium Department of Imaging & Pathology, Biomedical Forensic Sciences, KU Leuven, Leuven, Belgium Laboratory of Biodiversity and Evolutionary Genomics, Department of Biology, KU Leuven, Leuven, Belgium
| | | | - Ana M López-Parra
- Laboratory of Forensic and Population Genetics, Department of Toxicology and Health Legislation, Faculty of Medicine, Complutense University, Madrid, Spain
| | | | - Jelena Milasin
- School of Dental Medicine, Institute of Human Genetics, University of Belgrade, Belgrade, Serbia
| | | | - Horolma Pamjav
- Network of Forensic Science Institutes, Institute of Forensic Medicine, Budapest, Hungary
| | - Antti Sajantila
- Department of Forensic Medicine, Hjelt Institute, University of Helsinki, Helsinki, Finland Department of Molecular and Medical Genetics, Institute of Applied Genetics, University of North Texas Health Science Center, Fort Worth, Texas
| | - Werner Schempp
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
| | - Matt Sears
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Aslıhan Tolun
- Department of Molecular Biology and Genetics, Boğaziçi University, Istanbul, Turkey
| | | | - Anneleen Van Geystelen
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - Scott Watkins
- Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, UT
| | - Bruce Winney
- Department of Oncology, University of Oxford, Oxford, United Kingdom
| | - Mark A Jobling
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
28
|
Pseudoautosomal region 1 length polymorphism in the human population. PLoS Genet 2014; 10:e1004578. [PMID: 25375121 PMCID: PMC4222609 DOI: 10.1371/journal.pgen.1004578] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 07/07/2014] [Indexed: 12/30/2022] Open
Abstract
The human sex chromosomes differ in sequence, except for the pseudoautosomal regions (PAR) at the terminus of the short and the long arms, denoted as PAR1 and PAR2. The boundary between PAR1 and the unique X and Y sequences was established during the divergence of the great apes. During a copy number variation screen, we noted a paternally inherited chromosome X duplication in 15 independent families. Subsequent genomic analysis demonstrated that an insertional translocation of X chromosomal sequence into theMa Y chromosome generates an extended PAR. The insertion is generated by non-allelic homologous recombination between a 548 bp LTR6B repeat within the Y chromosome PAR1 and a second LTR6B repeat located 105 kb from the PAR boundary on the X chromosome. The identification of the reciprocal deletion on the X chromosome in one family and the occurrence of the variant in different chromosome Y haplogroups demonstrate this is a recurrent genomic rearrangement in the human population. This finding represents a novel mechanism shaping sex chromosomal evolution. The human sex chromosomes differ in sequence, except for homologous sequences at both ends, termed the pseudoautosomal regions (PAR1 and PAR2). PAR enables the pairing of chromosomes Y and X during meiosis. The PARs are located at the termini of respectively the short and long arms of chromosomes X and Y. The observation of gradual shortening of the Y chromosome over evolutionary time has led to speculations that the Y chromosome is “doomed to extinction.” However, the Y chromosome has been shaped over evolution not only by the loss of genes, but also by addition of genes as a result of interchromosomal exchanges. In this work, we identified males with a duplication on chromosome Xp22.33 of about 136 kb as an incidental finding during a copy number variation screen. We demonstrate that the duplicon is an insertional translocation due to non-allelic homologous recombination from the X to the Y chromosome that is flanked by a long terminal repeat (LTR6B). We show this translocation event has occurred independently multiple times and that the duplicated region recombines with the X chromosome. Therefore, the duplicated region represents an extension of the pseudoautosomal region, representing a novel mechanism shaping sex chromosomal evolution in humans.
Collapse
|
29
|
Yang Y, Xie B, Yan J. Application of next-generation sequencing technology in forensic science. GENOMICS PROTEOMICS & BIOINFORMATICS 2014; 12:190-7. [PMID: 25462152 PMCID: PMC4411420 DOI: 10.1016/j.gpb.2014.09.001] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/28/2014] [Accepted: 09/09/2014] [Indexed: 12/03/2022]
Abstract
Next-generation sequencing (NGS) technology, with its high-throughput capacity and low cost, has developed rapidly in recent years and become an important analytical tool for many genomics researchers. New opportunities in the research domain of the forensic studies emerge by harnessing the power of NGS technology, which can be applied to simultaneously analyzing multiple loci of forensic interest in different genetic contexts, such as autosomes, mitochondrial and sex chromosomes. Furthermore, NGS technology can also have potential applications in many other aspects of research. These include DNA database construction, ancestry and phenotypic inference, monozygotic twin studies, body fluid and species identification, and forensic animal, plant and microbiological analyses. Here we review the application of NGS technology in the field of forensic science with the aim of providing a reference for future forensics studies and practice.
Collapse
Affiliation(s)
- Yaran Yang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bingbing Xie
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiangwei Yan
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
30
|
Simonyan V, Mazumder R. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes (Basel) 2014; 5:957-81. [PMID: 25271953 PMCID: PMC4276921 DOI: 10.3390/genes5040957] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 09/22/2014] [Accepted: 09/22/2014] [Indexed: 12/30/2022] Open
Abstract
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
Collapse
Affiliation(s)
- Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA.
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA.
| |
Collapse
|
31
|
Larmuseau MH, Vanderheyden N, Van Geystelen A, Decorte R. A substantially lower frequency of uninformative matches between 23 versus 17 Y-STR haplotypes in north Western Europe. Forensic Sci Int Genet 2014; 11:214-9. [DOI: 10.1016/j.fsigen.2014.04.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Revised: 04/03/2014] [Accepted: 04/05/2014] [Indexed: 01/31/2023]
|
32
|
Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics 2014; 104:1-7. [PMID: 24930720 DOI: 10.1016/j.ygeno.2014.06.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 06/04/2014] [Indexed: 10/25/2022]
Abstract
UNLABELLED Next-generation sequencing data can be mapped to a reference genome to identify single-nucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position) and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets. AVAILABILITY PhyloSNP is freely available for download at http://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp.
Collapse
Affiliation(s)
- William J Faison
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.
| | - Alexandre Rostovtsev
- Center for Biologics Evaluation and Research, US Food and Drug Administration, 1451 Rockville Pike, Rockville, MD 20852, USA.
| | - Eduardo Castro-Nallar
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA.
| | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA.
| | - Konstantin Chumakov
- Center for Biologics Evaluation and Research, US Food and Drug Administration, 1451 Rockville Pike, Rockville, MD 20852, USA.
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, US Food and Drug Administration, 1451 Rockville Pike, Rockville, MD 20852, USA.
| | - Raja Mazumder
- The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA; McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
| |
Collapse
|
33
|
Alsmadi O, John SE, Thareja G, Hebbar P, Antony D, Behbehani K, Thanaraj TA. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS One 2014; 9:e99069. [PMID: 24896259 PMCID: PMC4045902 DOI: 10.1371/journal.pone.0099069] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 05/10/2014] [Indexed: 01/19/2023] Open
Abstract
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are 'novel'. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10-16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3' UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html.
Collapse
Affiliation(s)
- Osama Alsmadi
- Dasman Diabetes Institute, Dasman, Kuwait
- * E-mail: (TAT); (OA)
| | | | | | | | | | | | | |
Collapse
|
34
|
Van Geystelen A, Wenseleers T, Decorte R, Caspers MJL, Larmuseau MHD. In silico detection of phylogenetic informative Y-chromosomal single nucleotide polymorphisms from whole genome sequencing data. Electrophoresis 2014; 35:3102-10. [DOI: 10.1002/elps.201300459] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 12/03/2013] [Accepted: 01/07/2014] [Indexed: 12/25/2022]
Affiliation(s)
- Anneleen Van Geystelen
- Laboratory of Forensic Genetics and Molecular Archaeology; UZ Leuven Leuven Belgium
- Laboratory of Socioecology and Social Evolution; Department of Biology; KU Leuven Leuven Belgium
| | - Tom Wenseleers
- Laboratory of Socioecology and Social Evolution; Department of Biology; KU Leuven Leuven Belgium
| | - Ronny Decorte
- Laboratory of Forensic Genetics and Molecular Archaeology; UZ Leuven Leuven Belgium
- Biomedical Forensic Sciences; Department of Imaging & Pathology; KU Leuven Leuven Belgium
| | - Maarten J. L. Caspers
- Laboratory of Forensic Genetics and Molecular Archaeology; UZ Leuven Leuven Belgium
- Laboratory of Biodiversity and Evolutionary Genomics; Department of Biology; KU Leuven Leuven Belgium
| | - Maarten H. D. Larmuseau
- Laboratory of Forensic Genetics and Molecular Archaeology; UZ Leuven Leuven Belgium
- Biomedical Forensic Sciences; Department of Imaging & Pathology; KU Leuven Leuven Belgium
- Laboratory of Biodiversity and Evolutionary Genomics; Department of Biology; KU Leuven Leuven Belgium
| |
Collapse
|
35
|
Larmuseau MHD, Vanderheyden N, Van Geystelen A, Oven M, Knijff P, Decorte R. Recent Radiation within Y‐chromosomal Haplogroup R‐M269 Resulted in High Y‐STR Haplotype Resemblance. Ann Hum Genet 2014; 78:92-103. [DOI: 10.1111/ahg.12050] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/07/2013] [Indexed: 01/18/2023]
Affiliation(s)
- Maarten H. D. Larmuseau
- Laboratory of Forensic Genetics and Molecular ArchaeologyUZ Leuven Leuven Belgium
- Department of Imaging & PathologyBiomedical Forensic SciencesKU Leuven Leuven Belgium
- Laboratory of Biodiversity and Evolutionary GenomicsDepartment of BiologyKU Leuven Leuven Belgium
| | - Nancy Vanderheyden
- Laboratory of Forensic Genetics and Molecular ArchaeologyUZ Leuven Leuven Belgium
| | - Anneleen Van Geystelen
- Laboratory of Socioecology and Social EvolutionDepartment of BiologyKU Leuven Leuven Belgium
| | - Mannis Oven
- Department of Forensic Molecular BiologyErasmus MC – University Medical Center Rotterdam Rotterdam The Netherlands
| | - Peter Knijff
- Department of Human GeneticsLeiden University Medical Center Leiden The Netherlands
| | - Ronny Decorte
- Laboratory of Forensic Genetics and Molecular ArchaeologyUZ Leuven Leuven Belgium
- Department of Imaging & PathologyBiomedical Forensic SciencesKU Leuven Leuven Belgium
| |
Collapse
|
36
|
Cole C, Krampis K, Karagiannis K, Almeida JS, Faison WJ, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics 2014; 15:28. [PMID: 24467687 PMCID: PMC3916084 DOI: 10.1186/1471-2105-15-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 01/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.
| |
Collapse
|
37
|
Larmuseau MHD, Vanoverbeke J, Van Geystelen A, Defraene G, Vanderheyden N, Matthys K, Wenseleers T, Decorte R. Low historical rates of cuckoldry in a Western European human population traced by Y-chromosome and genealogical data. Proc Biol Sci 2013; 280:20132400. [PMID: 24266034 PMCID: PMC3813347 DOI: 10.1098/rspb.2013.2400] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 09/30/2013] [Indexed: 11/12/2022] Open
Abstract
Recent evidence suggests that seeking out extra-pair paternity (EPP) can be a viable alternative reproductive strategy for both males and females in many pair-bonded species, including humans. Accurate data on EPP rates in humans, however, are scant and mostly restricted to extant populations. Here, we provide the first large-scale, unbiased genetic study of historical EPP rates in a Western European human population based on combining Y-chromosomal data to infer genetic patrilineages with genealogical and surname data, which reflect known historical presumed paternity. Using two independent methods, we estimate that over the last few centuries, EPP rates in Flanders (Belgium) were only around 1–2% per generation. This figure is substantially lower than the 8–30% per generation reported in some behavioural studies on historical EPP rates, but comparable with the rates reported by other genetic studies of contemporary Western European populations. These results suggest that human EPP rates have not changed substantially during the last 400 years in Flanders and imply that legal genealogies rarely differ from the biological ones. This result has significant implications for a diverse set of fields, including human population genetics, historical demography, forensic science and human sociobiology.
Collapse
Affiliation(s)
- M. H. D. Larmuseau
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
- Laboratory of Biodiversity and Evolutionary Genomics, Department of Biology, KU Leuven, Leuven, Belgium
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - J. Vanoverbeke
- Laboratory of Aquatic Ecology and Evolutionary Biology, Department of Biology, KU Leuven, Leuven, Belgium
| | - A. Van Geystelen
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - G. Defraene
- Department of Radiation Oncology, UZ Leuven, Leuven, Belgium
| | - N. Vanderheyden
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
| | - K. Matthys
- Centre for Sociological Research (CESO), Family and Population Studies, KU Leuven, Leuven, Belgium
| | - T. Wenseleers
- Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - R. Decorte
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
- Biomedical Forensic Sciences, Department of Imaging and Pathology, KU Leuven, Leuven, Belgium
| |
Collapse
|
38
|
Zhang F, Chen R, Liu D, Yao X, Li G, Jin Y, Yu C, Li Y, Coin LJM. YHap: a population model for probabilistic assignment of Y haplogroups from re-sequencing data. BMC Bioinformatics 2013; 14:331. [PMID: 24252171 PMCID: PMC4225519 DOI: 10.1186/1471-2105-14-331] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 11/12/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Y haplogroup analyses are an important component of genealogical reconstruction, population genetic analyses, medical genetics and forensics. These fields are increasingly moving towards use of low-coverage, high throughput sequencing. While there have been methods recently proposed for assignment of Y haplogroups on the basis of high-coverage sequence data, assignment on the basis of low-coverage data remains challenging. RESULTS We developed a new algorithm, YHap, which uses an imputation framework to jointly predict Y chromosome genotypes and assign Y haplogroups using low coverage population sequence data. We use data from the 1000 genomes project to demonstrate that YHap provides accurate Y haplogroup assignment with less than 2x coverage. CONCLUSIONS Borrowing information across multiple samples within a population using an imputation framework enables accurate Y haplogroup assignment.
Collapse
|
39
|
van Oven M, Van Geystelen A, Kayser M, Decorte R, Larmuseau MHD. Seeing the wood for the trees: a minimal reference phylogeny for the human Y chromosome. Hum Mutat 2013; 35:187-91. [PMID: 24166809 DOI: 10.1002/humu.22468] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 10/11/2013] [Indexed: 11/11/2022]
Abstract
During the last few decades, a wealth of studies dedicated to the human Y chromosome and its DNA variation, in particular Y-chromosome single-nucleotide polymorphisms (Y-SNPs), has led to the construction of a well-established Y-chromosome phylogeny. Since the recent advent of new sequencing technologies, the discovery of additional Y-SNPs is exploding and their continuous incorporation in the phylogenetic tree is leading to an ever higher resolution. However, the large and increasing amount of information included in the "complete" Y-chromosome phylogeny, which now already includes many thousands of identified Y-SNPs, can be overwhelming and complicates its understanding as well as the task of selecting suitable markers for genotyping purposes in evolutionary, demographic, anthropological, genealogical, medical, and forensic studies. As a solution, we introduce a concise reference phylogeny whereby we do not aim to provide an exhaustive tree that includes all known Y-SNPs but, rather, a quite stable reference tree aiming for optimal global discrimination capacity based on a strongly reduced set that includes only the most resolving Y-SNPs. Furthermore, with this reference tree, we wish to propose a common standard for Y-marker as well as Y-haplogroup nomenclature. The current version of our tree is based on a core set of 417 branch-defining Y-SNPs and is available online at http://www.phylotree.org/Y.
Collapse
Affiliation(s)
- Mannis van Oven
- Department of Forensic Molecular Biology, Erasmus MC - University Medical Center Rotterdam, Rotterdam, The Netherlands
| | | | | | | | | |
Collapse
|
40
|
Larmuseau MHD, Delorme P, Germain P, Vanderheyden N, Gilissen A, Van Geystelen A, Cassiman JJ, Decorte R. Genetic genealogy reveals true Y haplogroup of House of Bourbon contradicting recent identification of the presumed remains of two French Kings. Eur J Hum Genet 2013; 22:681-7. [PMID: 24105374 DOI: 10.1038/ejhg.2013.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 08/12/2013] [Accepted: 08/15/2013] [Indexed: 11/09/2022] Open
Abstract
Genetic analysis strongly increases the opportunity to identify skeletal remains or other biological samples from historical figures. However, validation of this identification is essential and should be done by DNA typing of living relatives. Based on the similarity of a limited set of Y-STRs, a blood sample and a head were recently identified as those belonging respectively to King Louis XVI and his paternal ancestor King Henry IV. Here, we collected DNA samples from three living males of the House of Bourbon to validate the since then controversial identification of these remains. The three living relatives revealed the Bourbon's Y-chromosomal variant on a high phylogenetic resolution for several members of the lineage between Henry IV and Louis XVI. This 'true' Bourbon's variant is different from the published Y-STR profiles of the blood as well as of the head. The earlier identifications of these samples can therefore not be validated. Moreover, matrilineal genealogical data revealed that the published mtDNA sequence of the head was also different from the one of a series of relatives. This therefore leads to the conclusion that the analyzed samples were not from the French kings. Our study once again demonstrated that in order to realize an accurate genetic identification of historical remains DNA typing of living persons, who are paternally or maternally related with the presumed donor of the samples, is required.
Collapse
Affiliation(s)
- Maarten H D Larmuseau
- 1] Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium [2] Forensic Biomedical Sciences, Department of Imaging and Pathology, KU Leuven, Leuven, Belgium [3] Laboratory of Biodiversity and Evolutionary Genomics, Department of Biology, KU Leuven, Leuven, Belgium
| | | | | | - Nancy Vanderheyden
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
| | - Anja Gilissen
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
| | - Anneleen Van Geystelen
- 1] Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium [2] Laboratory of Socioecology and Social Evolution, Department of Biology, KU Leuven, Leuven, Belgium
| | - Jean-Jacques Cassiman
- Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium
| | - Ronny Decorte
- 1] Laboratory of Forensic Genetics and Molecular Archaeology, UZ Leuven, Leuven, Belgium [2] Forensic Biomedical Sciences, Department of Imaging and Pathology, KU Leuven, Leuven, Belgium
| |
Collapse
|
41
|
van Oven M, Toscani K, van den Tempel N, Ralf A, Kayser M. Multiplex genotyping assays for fine-resolution subtyping of the major human Y-chromosome haplogroups E, G, I, J, and R in anthropological, genealogical, and forensic investigations. Electrophoresis 2013; 34:3029-38. [PMID: 23893838 DOI: 10.1002/elps.201300210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Revised: 06/17/2013] [Accepted: 06/26/2013] [Indexed: 12/20/2022]
Abstract
Inherited DNA polymorphisms located within the nonrecombing portion of the human Y chromosome provide a powerful means of tracking the patrilineal ancestry of male individuals. Recently, we introduced an efficient genotyping method for the detection of the basal Y-chromosome haplogroups A to T, as well as an additional method for the dissection of haplogroup O into its sublineages. To further extend the use of the Y chromosome as an evolutionary marker, we here introduce a set of genotyping assays for fine-resolution subtyping of haplogroups E, G, I, J, and R, which make up the bulk of Western Eurasian and African Y chromosomes. The marker selection includes a total of 107 carefully selected bi-allelic polymorphisms that were divided into eight hierarchically organized multiplex assays (two for haplogroup E, one for I, one for J, one for G, and three for R) based on the single-base primer extension (SNaPshot) technology. Not only does our method allow for enhanced Y-chromosome lineage discrimination, the more restricted geographic distribution of the subhaplogroups covered also enables more fine-scaled estimations of patrilineal bio-geographic origin. Supplementing our previous method for basal Y-haplogroup detection, the currently introduced assays are thus expected to be of major relevance for future DNA studies targeting male-specific ancestry for forensic, anthropological, and genealogical purposes.
Collapse
Affiliation(s)
- Mannis van Oven
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | | | | | | | | |
Collapse
|
42
|
Larmuseau MHD, Vanderheyden N, Van Geystelen A, van Oven M, Kayser M, Decorte R. Increasing phylogenetic resolution still informative for Y chromosomal studies on West-European populations. Forensic Sci Int Genet 2013; 9:179-85. [PMID: 23683810 DOI: 10.1016/j.fsigen.2013.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Revised: 03/22/2013] [Accepted: 04/07/2013] [Indexed: 01/28/2023]
Abstract
Many Y-chromosomal lineages which are defined in the latest phylogenetic tree of the human Y chromosome by the Y Chromosome Consortium (YCC) in 2008 are distributed in (Western) Europe due to the fact that a large number of phylogeographic studies focus on this area. Therefore, the question arises whether newly discovered polymorphisms on the Y chromosome will still be interesting to study Western Europeans on a population genetic level. To address this question, the West-European region of Flanders (Belgium) was selected as study area since more than 1000 Y chromosomes from this area have previously been genotyped at the highest resolution of the 2008 YCC-tree and coupled to in-depth genealogical data. Based on these data the temporal changes of the population genetic pattern over the last centuries within Flanders were studied and the effects of several past gene flow events were identified. In the present study a set of recently reported novel Y-SNPs were genotyped to further characterize all those Flemish Y chromosomes that belong to haplogroups G, R-M269 and T. Based on this extended Y-SNP set the discrimination power increased drastically as previous large (sub-)haplogroups are now subdivided in several non-marginal groups. Next, the previously observed population structure within Flanders appeared to be the result of different gradients of independent sub-haplogroups. Moreover, for the first time within Flanders a significant East-West gradient was observed in the frequency of two R-M269 lineages, and this gradient is still present when considering the current residence of the DNA donors. Our results thus suggest that an update of the Y-chromosomal tree based on new polymorphisms is still useful to increase the discrimination power based on Y-SNPs and to study population genetic patterns in more detail, even in an already well-studied region such as Western Europe.
Collapse
Affiliation(s)
- M H D Larmuseau
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Forensic Medicine, Department of Imaging & Pathology, Leuven, Belgium; KU Leuven, Laboratory of Biodiversity and Evolutionary Genomics, Department of Biology, Leuven, Belgium.
| | - N Vanderheyden
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium
| | - A Van Geystelen
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Laboratory of Socioecology and Social Evolution, Department of Biology, Leuven, Belgium
| | - M van Oven
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - M Kayser
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - R Decorte
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Forensic Medicine, Department of Imaging & Pathology, Leuven, Belgium
| |
Collapse
|
43
|
Van Geystelen A, Decorte R, Larmuseau MHD. Updating the Y-chromosomal phylogenetic tree for forensic applications based on whole genome SNPs. Forensic Sci Int Genet 2013; 7:573-580. [PMID: 23597787 DOI: 10.1016/j.fsigen.2013.03.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 03/19/2013] [Indexed: 01/17/2023]
Abstract
The Y-chromosomal phylogenetic tree has a wide variety of important forensic applications and therefore it needs to be state-of-the-art. Nevertheless, since the last 'official' published tree many publications reported additional Y-chromosomal lineages and other phylogenetic topologies. Therefore, it is difficult for forensic scientists to interpret those reports and use an up-to-date tree and corresponding nomenclature in their daily work. Whole genome sequencing (WGS) data is useful to verify and optimise the current phylogenetic tree for haploid markers. The AMY-tree software is the first open access program which analyses WGS data for Y-chromosomal phylogenetic applications. Here, all published information is collected in a phylogenetic tree and the correctness of this tree is checked based on the first large analysis of 747 WGS samples with AMY-tree. The obtained result is one phylogenetic tree with all peer-reviewed reported Y-SNPs without the observed recurrent and ambiguous mutations. Nevertheless, the results showed that currently only the genomes of a limited set of Y-chromosomal (sub-)haplogroups is available and that many newly reported Y-SNPs based on WGS projects are false positives, even with high sequencing coverage methods. This study demonstrates the usefulness of AMY-tree in the process of checking the quality of the present Y-chromosomal tree and it accentuates the difficulties to enlarge this tree based on only WGS methods.
Collapse
Affiliation(s)
- A Van Geystelen
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Department of Biology, Laboratory of Socioecology and Social Evolution, Leuven, Belgium
| | - R Decorte
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Department of Imaging & Pathology, Forensic Medicine, Leuven, Belgium
| | - M H D Larmuseau
- UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium; KU Leuven, Department of Imaging & Pathology, Forensic Medicine, Leuven, Belgium; KU Leuven, Department of Biology, Laboratory of Biodiversity and Evolutionary Genomics, Leuven, Belgium.
| |
Collapse
|