1
|
Reynolds AZ, Niedbalski SD. Sex-biased gene regulation varies across human populations as a result of adaptive evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2024; 183:e24888. [PMID: 38100225 DOI: 10.1002/ajpa.24888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 11/14/2023] [Accepted: 11/28/2023] [Indexed: 03/03/2024]
Abstract
OBJECTIVES Studies of human sexual dimorphism and gender disparities in health focus on ostensibly universal molecular sex differences, such as sex chromosomes and circulating hormone levels, while ignoring the extraordinary diversity in biology, behavior, and culture acquired by different human populations over their unique evolutionary histories. MATERIALS AND METHODS Using RNA-Seq data and whole genome sequences from 1000G and HGDP, we investigate variation in sex-biased gene expression across 11 human populations and test whether population-level variation in sex-biased expression may have resulted from adaptive evolution in regions containing sex-specific regulatory variants. RESULTS We find that sex-biased gene expression in humans is highly variable, mostly population-specific, and demonstrates between population reversals. Expression quantitative trait locus mapping reveals sex-specific regulatory regions with evidence of recent positive natural selection, suggesting that variation in sex-biased expression may have evolved as an adaptive response to ancestral environments experienced by human populations. DISCUSSION These results indicate that sex-biased gene expression is more flexible than previously thought and is not generally shared among human populations. Instead, molecular phenotypes associated with sex depend on complex interactions between population-specific molecular evolution and physiological responses to contemporary socioecologies.
Collapse
Affiliation(s)
- Adam Z Reynolds
- Department of Anthropology, University of New Mexico, Albuquerque, New Mexico, USA
- Department of Internal Medicine, University of New Mexico, Albuquerque, New Mexico, USA
| | - Sara D Niedbalski
- Department of Anthropology, University of New Mexico, Albuquerque, New Mexico, USA
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS, Paris, France
| |
Collapse
|
2
|
Taylor DJ, Chhetri SB, Tassia MG, Biddanda A, Battle A, McCoy RC. Sources of gene expression variation in a globally diverse human cohort. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.04.565639. [PMID: 37965206 PMCID: PMC10635147 DOI: 10.1101/2023.11.04.565639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Genetic variation influencing gene expression and splicing is a key source of phenotypic diversity. Though invaluable, studies investigating these links in humans have been strongly biased toward participants of European ancestries, diminishing generalizability and hindering evolutionary research. To address these limitations, we developed MAGE, an open-access RNA-seq data set of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, mirroring variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-eQTLs and cis-sQTLs, respective), identifying >15,000 putatively causal eQTLs and >16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1310 eQTLs and 1657 sQTLs that are largely private to previously underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations and that apparent "population-specific" effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands understanding of gene expression diversity across human populations and provides an inclusive resource for studying the evolution and function of human genomes.
Collapse
Affiliation(s)
- Dylan J. Taylor
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| | - Surya B. Chhetri
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
| | | | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore MD, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| |
Collapse
|
3
|
DeGorter MK, Goddard PC, Karakoc E, Kundu S, Yan SM, Nachun D, Abell N, Aguirre M, Carstensen T, Chen Z, Durrant M, Dwaracherla VR, Feng K, Gloudemans MJ, Hunter N, Moorthy MPS, Pomilla C, Rodrigues KB, Smith CJ, Smith KS, Ungar RA, Balliu B, Fellay J, Flicek P, McLaren PJ, Henn B, McCoy RC, Sugden L, Kundaje A, Sandhu MS, Gurdasani D, Montgomery SB. Transcriptomics and chromatin accessibility in multiple African population samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.04.564839. [PMID: 37986808 PMCID: PMC10659267 DOI: 10.1101/2023.11.04.564839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.
Collapse
Affiliation(s)
| | - Page C Goddard
- Department of Genetics, Stanford University, Stanford, CA
| | - Emre Karakoc
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford CA
| | | | - Daniel Nachun
- Department of Pathology, Stanford University, Stanford, CA
| | - Nathan Abell
- Department of Genetics, Stanford University, Stanford, CA
| | - Matthew Aguirre
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Tommy Carstensen
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford CA
| | | | | | - Karen Feng
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | | | - Naiomi Hunter
- Department of Genetics, Stanford University, Stanford, CA
| | | | - Cristina Pomilla
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | | | | | - Kevin S Smith
- Department of Pathology, Stanford University, Stanford, CA
| | - Rachel A Ungar
- Department of Genetics, Stanford University, Stanford, CA
| | - Brunilda Balliu
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA and Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA
| | - Jacques Fellay
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland and Precision Medicine Unit, Biomedical Data Science Center, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Paul Flicek
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Paul J McLaren
- Sexually Transmitted and Blood-Borne Infections Division at JC Wilt Infectious Diseases Research Centre, National Microbiology Laboratory Branch, Public Health Agency of Canada, Winnipeg, Canada and Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - Brenna Henn
- Department of Anthropology, University of California Davis, Davis CA and Genome Center, University of California Davis, Davis CA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore
| | - Lauren Sugden
- Department of Mathematics and Computer Science, Dusquesne University, Pittsburgh, PA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA
- Department of Computer Science, Stanford University, Stanford CA
| | | | - Deepti Gurdasani
- William Harvey Research Institute, Queen Mary University of London, London, UK; Kirby Institute, University of New South Wales, Australia; School of Medicine, University of Western Australia, Australia
| | | |
Collapse
|
4
|
Wang J, Gazal S. Ancestry-specific regulatory and disease architectures are likely due to cell-type-specific gene-by-environment interactions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.20.23297214. [PMID: 37905038 PMCID: PMC10615008 DOI: 10.1101/2023.10.20.23297214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Multi-ancestry genome-wide association studies (GWAS) have highlighted the existence of variants with ancestry-specific effect sizes. Understanding where and why these ancestry-specific effects occur is fundamental to understanding the genetic basis of human diseases and complex traits. Here, we characterized genes differentially expressed across ancestries (ancDE genes) at the cell-type level by leveraging single-cell RNA-seq data in peripheral blood mononuclear cells for 21 individuals with East Asian (EAS) ancestry and 23 individuals with European (EUR) ancestry (172K cells); then, we tested if variants surrounding those genes were enriched in disease variants with ancestry-specific effect sizes by leveraging ancestry-matched GWAS of 31 diseases and complex traits (average N = 90K and 267K in EAS and EUR, respectively). We observed that ancDE genes tend to be cell-type-specific, to be enriched in genes interacting with the environment, and in variants with ancestry-specific disease effect sizes, suggesting the impact of shared cell-type-specific gene-by-environment (GxE) interactions between regulatory and disease architectures. Finally, we illustrated how GxE interactions might have led to ancestry-specific MCL1 expression in B cells, and ancestry-specific allele effect sizes in lymphocyte count GWAS for variants surrounding MCL1. Our results imply that large single-cell and GWAS datasets in diverse populations are required to improve our understanding on the effect of genetic variants on human diseases.
Collapse
Affiliation(s)
- Juehan Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
5
|
Puixeu G, Macon A, Vicoso B. Sex-specific estimation of cis and trans regulation of gene expression in heads and gonads of Drosophila melanogaster. G3 (BETHESDA, MD.) 2023; 13:jkad121. [PMID: 37259621 PMCID: PMC10411594 DOI: 10.1093/g3journal/jkad121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/02/2023]
Abstract
The regulatory architecture of gene expression is known to differ substantially between sexes in Drosophila, but most studies performed so far used whole-body data and only single crosses, which may have limited their scope to detect patterns that are robust across tissues and biological replicates. Here, we use allele-specific gene expression of parental and reciprocal hybrid crosses between 6 Drosophila melanogaster inbred lines to quantify cis- and trans-regulatory variation in heads and gonads of both sexes separately across 3 replicate crosses. Our results suggest that female and male heads, as well as ovaries, have a similar regulatory architecture. On the other hand, testes display more and substantially different cis-regulatory effects, suggesting that sex differences in the regulatory architecture that have been previously observed may largely derive from testis-specific effects. We also examine the difference in cis-regulatory variation of genes across different levels of sex bias in gonads and heads. Consistent with the idea that intersex correlations constrain expression and can lead to sexual antagonism, we find more cis variation in unbiased and moderately biased genes in heads. In ovaries, reduced cis variation is observed for male-biased genes, suggesting that cis variants acting on these genes in males do not lead to changes in ovary expression. Finally, we examine the dominance patterns of gene expression and find that sex- and tissue-specific patterns of inheritance as well as trans-regulatory variation are highly variable across biological crosses, although these were performed in highly controlled experimental conditions. This highlights the importance of using various genetic backgrounds to infer generalizable patterns.
Collapse
Affiliation(s)
- Gemma Puixeu
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg 3400, Austria
| | - Ariana Macon
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg 3400, Austria
| | - Beatriz Vicoso
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg 3400, Austria
| |
Collapse
|
6
|
Kelly DE, Ramdas S, Ma R, Rawlings-Goss RA, Grant GR, Ranciaro A, Hirbo JB, Beggs W, Yeager M, Chanock S, Nyambo TB, Omar SA, Woldemeskel D, Belay G, Li H, Brown CD, Tishkoff SA. The genetic and evolutionary basis of gene expression variation in East Africans. Genome Biol 2023; 24:35. [PMID: 36829244 PMCID: PMC9951478 DOI: 10.1186/s13059-023-02874-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Mapping of quantitative trait loci (QTL) associated with molecular phenotypes is a powerful approach for identifying the genes and molecular mechanisms underlying human traits and diseases, though most studies have focused on individuals of European descent. While important progress has been made to study a greater diversity of human populations, many groups remain unstudied, particularly among indigenous populations within Africa. To better understand the genetics of gene regulation in East Africans, we perform expression and splicing QTL mapping in whole blood from a cohort of 162 diverse Africans from Ethiopia and Tanzania. We assess replication of these QTLs in cohorts of predominantly European ancestry and identify candidate genes under selection in human populations. RESULTS We find the gene regulatory architecture of African and non-African populations is broadly shared, though there is a considerable amount of variation at individual loci across populations. Comparing our analyses to an equivalently sized cohort of European Americans, we find that QTL mapping in Africans improves the detection of expression QTLs and fine-mapping of causal variation. Integrating our QTL scans with signatures of natural selection, we find several genes related to immunity and metabolism that are highly differentiated between Africans and non-Africans, as well as a gene associated with pigmentation. CONCLUSION Extending QTL mapping studies beyond European ancestry, particularly to diverse indigenous populations, is vital for a complete understanding of the genetic architecture of human traits and can reveal novel functional variation underlying human traits and disease.
Collapse
Affiliation(s)
- Derek E Kelly
- Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
- Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Shweta Ramdas
- Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Rong Ma
- Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | | | - Jibril B Hirbo
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William Beggs
- Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Meredith Yeager
- Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Stephen Chanock
- Division of Cancer Epidemiology and Genetics, National Institutes of Health, Rockville, MD, USA
| | - Thomas B Nyambo
- Department of Biochemistry, Kampala International University in Tanzania, Dar Es Salaam, Tanzania
| | - Sabah A Omar
- Center for Biotechnology Research and Development, Kenya Medical Research Institute, Nairobi, Kenya
| | - Dawit Woldemeskel
- Microbial Cellular and Molecular Biology Department, Addis Ababa University, Addis Ababa, Ethiopia
| | - Gurja Belay
- Microbial Cellular and Molecular Biology Department, Addis Ababa University, Addis Ababa, Ethiopia
| | - Hongzhe Li
- Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Christopher D Brown
- Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
- Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Sarah A Tishkoff
- Genetics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Biology, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
7
|
García-Pérez R, Ramirez JM, Ripoll-Cladellas A, Chazarra-Gil R, Oliveros W, Soldatkina O, Bosio M, Rognon PJ, Capella-Gutierrez S, Calvo M, Reverter F, Guigó R, Aguet F, Ferreira PG, Ardlie KG, Melé M. The landscape of expression and alternative splicing variation across human traits. CELL GENOMICS 2022; 3:100244. [PMID: 36777183 PMCID: PMC9903719 DOI: 10.1016/j.xgen.2022.100244] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 12/31/2022]
Abstract
Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.
Collapse
Affiliation(s)
- Raquel García-Pérez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Jose Miguel Ramirez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Aida Ripoll-Cladellas
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Ruben Chazarra-Gil
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Winona Oliveros
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Oleksandra Soldatkina
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Mattia Bosio
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Paul Joris Rognon
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain,Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Catalonia 08005, Spain,Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, Barcelona, Catalonia 08034, Spain
| | - Salvador Capella-Gutierrez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Miquel Calvo
- Statistics Section, Faculty of Biology, Universitat de Barcelona (UB), Barcelona, Catalonia 08028, Spain
| | - Ferran Reverter
- Statistics Section, Faculty of Biology, Universitat de Barcelona (UB), Barcelona, Catalonia 08028, Spain
| | - Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation, Barcelona, Catalonia 08003, Spain
| | | | - Pedro G. Ferreira
- Department of Computer Science, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal,Laboratory of Artificial Intelligence and Decision Support, INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal,Institute of Molecular Pathology and Immunology of the University of Porto, Institute for Research and Innovation in Health (i3s), R. Alfredo Allen 208, 4200-135 Porto, Portugal
| | | | - Marta Melé
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain,Corresponding author
| |
Collapse
|
8
|
A high-throughput real-time PCR tissue-of-origin test to distinguish blood from lymphoblastoid cell line DNA for (epi)genomic studies. Sci Rep 2022; 12:4684. [PMID: 35304543 PMCID: PMC8933453 DOI: 10.1038/s41598-022-08663-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/09/2022] [Indexed: 12/13/2022] Open
Abstract
Lymphoblastoid cell lines (LCLs) derive from blood infected in vitro by Epstein–Barr virus and were used in several genetic, transcriptomic and epigenomic studies. Although few changes were shown between LCL and blood genotypes (SNPs) validating their use in genetics, more were highlighted for other genomic features and/or in their transcriptome and epigenome. This could render them less appropriate for these studies, notably when blood DNA could still be available. Here we developed a simple, high-throughput and cost-effective real-time PCR approach allowing to distinguish blood from LCL DNA samples based on the presence of EBV relative load and rearranged T-cell receptors γ and β. Our approach was able to achieve 98.5% sensitivity and 100% specificity on DNA of known origin (458 blood and 316 LCL DNA). It was further applied to 1957 DNA samples from the CEPH Aging cohort comprising DNA of uncertain origin, identifying 784 blood and 1016 LCL DNA. A subset of these DNA was further analyzed with an epigenetic clock indicating that DNA extracted from blood should be preferred to LCL for DNA methylation-based age prediction analysis. Our approach could thereby be a powerful tool to ascertain the origin of DNA in old collections prior to (epi)genomic studies.
Collapse
|
9
|
Guo S, Huang S, Jiang X, Hu H, Han D, Moreno CS, Fairbrother GL, Hughes DA, Stoneking M, Khaitovich P. Variation of microRNA expression in the human placenta driven by population identity and sex of the newborn. BMC Genomics 2021; 22:286. [PMID: 33879051 PMCID: PMC8059241 DOI: 10.1186/s12864-021-07542-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/22/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Analysis of lymphocyte cell lines revealed substantial differences in the expression of mRNA and microRNA (miRNA) among human populations. The extent of such population-associated differences in actual human tissues remains largely unexplored. The placenta is one of the few solid human tissues that can be collected in substantial numbers in a controlled manner, enabling quantitative analysis of transient biomolecules such as RNA transcripts. Here, we analyzed microRNA (miRNA) expression in human placental samples derived from 36 individuals representing four genetically distinct human populations: African Americans, European Americans, South Asians, and East Asians. All samples were collected at the same hospital following a unified protocol, thus minimizing potential biases that might influence the results. RESULTS Sequence analysis of the miRNA fraction yielded 938 annotated and 70 novel miRNA transcripts expressed in the placenta. Of them, 82 (9%) of annotated and 11 (16%) of novel miRNAs displayed quantitative expression differences among populations, generally reflecting reported genetic and mRNA-expression-based distances. Several co-expressed miRNA clusters stood out from the rest of the population-associated differences in terms of miRNA evolutionary age, tissue-specificity, and disease-association characteristics. Among three non-environmental influenced demographic parameters, the second largest contributor to miRNA expression variation after population was the sex of the newborn, with 32 miRNAs (3% of detected) exhibiting significant expression differences depending on whether the newborn was male or female. Male-associated miRNAs were evolutionarily younger and correlated inversely with the expression of target mRNA involved in neuron-related functions. In contrast, both male and female-associated miRNAs appeared to mediate different types of hormonal responses. Demographic factors further affected reported imprinted expression of 66 placental miRNAs: the imprinting strength correlated with the mother's weight, but not height. CONCLUSIONS Our results showed that among 12 assessed demographic variables, population affiliation and fetal sex had a substantial influence on miRNA expression variation among human placental samples. The effect of newborn-sex-associated miRNA differences further led to expression inhibition of the target genes clustering in specific functional pathways. By contrast, population-driven miRNA differences might mainly represent neutral changes with minimal functional impacts.
Collapse
Affiliation(s)
- Song Guo
- Skolkovo Institute of Science and Technology, 121205, Moscow, Russia
| | - Shuyun Huang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, CAS, 320 Yue Yang Road, Shanghai, 200031, China
| | - Xi Jiang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, CAS, 320 Yue Yang Road, Shanghai, 200031, China
| | - Haiyang Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, CAS, 320 Yue Yang Road, Shanghai, 200031, China
| | - Dingding Han
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, CAS, 320 Yue Yang Road, Shanghai, 200031, China
| | - Carlos S Moreno
- Department of Pathology and Laboratory Medicine and Department of Biomedical Informatics, Emory University, Atlanta, GA, 30322, USA
| | - Genevieve L Fairbrother
- Obstetrics and Gynecology of Atlanta, 1100 Johnson Ferry Rd NE Suite 800, Center 2, Atlanta, GA, 30342, USA
| | - David A Hughes
- MRC Integrative Epidemiology Unit at University of Bristol, Bristol, BS8 2BN, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Mark Stoneking
- Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany.
| | | |
Collapse
|
10
|
Hamdi Y, Zass L, Othman H, Radouani F, Allali I, Hanachi M, Okeke CJ, Chaouch M, Tendwa MB, Samtal C, Mohamed Sallam R, Alsayed N, Turkson M, Ahmed S, Benkahla A, Romdhane L, Souiai O, Tastan Bishop Ö, Ghedira K, Mohamed Fadlelmola F, Mulder N, Kamal Kassim S. Human OMICs and Computational Biology Research in Africa: Current Challenges and Prospects. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:213-233. [PMID: 33794662 DOI: 10.1089/omi.2021.0004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the publication of the first human genome, OMICs research, including genomics, transcriptomics, proteomics, and metagenomics, has been on the rise. OMICs studies revealed the complex genetic diversity among human populations and challenged our understandings of genotype-phenotype correlations. Africa, being the cradle of the first modern humans, is distinguished by a large genetic diversity within its populations and rich ethnolinguistic history. However, the available human OMICs tools and databases are not representative of this diversity, therefore creating significant gaps in biomedical research. African scientists, students, and publics are among the key contributors to OMICs systems science. This expert review examines the pressing issues in human OMICs research, education, and development in Africa, as seen through a lens of computational biology, public health relevant technology innovation, critically-informed science governance, and how best to harness OMICs data to benefit health and societies in Africa and beyond. We underscore the disparities between North and Sub-Saharan Africa at different levels. A harmonized African ethnolinguistic classification would help address annotation challenges associated with population diversity. Finally, building on the existing strategic research initiatives, such as the H3Africa and H3ABioNet Consortia, we highly recommend addressing large-scale multidisciplinary research challenges, strengthening research collaborations and knowledge transfer, and enhancing the ability of African researchers to influence and shape national and international research, policy, and funding agendas. This article and analysis contribute to a deeper understanding of past and current challenges in the African OMICs innovation ecosystem, while also offering foresight on future innovation trajectories.
Collapse
Affiliation(s)
- Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Houcemeddine Othman
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Fouzia Radouani
- Chlamydiae and Mycoplasmas Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, and Genomic Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Chiamaka Jessica Okeke
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Maureen Bilinga Tendwa
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-food and Health, Faculty of Sciences Dhar El Mahraz-Sidi Mohammed Ben Abdellah University, Fez, Morocco.,University of Mohamed Premier, Oujda, Morocco
| | - Reem Mohamed Sallam
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt.,Department of Basic Medical Sciences, Faculty of Medicine, Galala University, Suez, Egypt
| | - Nihad Alsayed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Michael Turkson
- The National Institute for Mathematical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Samah Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Lilia Romdhane
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Faisal Mohamed Fadlelmola
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Samar Kamal Kassim
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
11
|
Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, Kim SS, Luo Y, Amariuta T, Huang H, Okada Y, Raychaudhuri S, Sunyaev SR, Price AL. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 2021; 12:1098. [PMID: 33597505 PMCID: PMC7889654 DOI: 10.1038/s41467-021-21286-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/15/2021] [Indexed: 01/31/2023] Open
Abstract
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Collapse
Affiliation(s)
- Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Evan M Koch
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine M Siewert
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yang Luo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tiffany Amariuta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
12
|
Colbran LL, Gamazon ER, Zhou D, Evans P, Cox NJ, Capra JA. Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences. Nat Ecol Evol 2019; 3:1598-1606. [PMID: 31591491 PMCID: PMC7046098 DOI: 10.1038/s41559-019-0996-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Accepted: 08/27/2019] [Indexed: 01/01/2023]
Abstract
Sequencing DNA derived from archaic bones has enabled genetic comparison of Neanderthals and anatomically modern humans (AMHs), and revealed that they interbred. However, interpreting what genetic differences imply about their phenotypic differences remains challenging. Here, we introduce an approach for identifying divergent gene regulation between archaic hominins, such as Neanderthals, and AMH sequences, and find 766 genes that are likely to have been divergently regulated (DR) by Neanderthal haplotypes that do not remain in AMHs. DR genes include many involved in phenotypes known to differ between Neanderthals and AMHs, such as the structure of the rib cage and supraorbital ridge development. They are also enriched for genes associated with spontaneous abortion, polycystic ovary syndrome, myocardial infarction and melanoma. Phenotypes associated with modern human variation in these genes' regulation in ~23,000 biobank patients further support their involvement in immune and cardiovascular phenotypes. Comparing DR genes between two Neanderthals and a Denisovan revealed divergence in the immune system and in genes associated with skeletal and dental morphology that are consistent with the archaeological record. These results establish differences in gene regulatory architecture between AMHs and archaic hominins, and provide an avenue for exploring phenotypic differences between archaic groups from genomic information alone.
Collapse
Affiliation(s)
- Laura L Colbran
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric R Gamazon
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Clare Hall, University of Cambridge, Cambridge, UK
| | - Dan Zhou
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Patrick Evans
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
13
|
Kanai M, Maeda Y, Okada Y. Grimon: graphical interface to visualize multi-omics networks. Bioinformatics 2019; 34:3934-3936. [PMID: 29931190 PMCID: PMC6223372 DOI: 10.1093/bioinformatics/bty488] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 06/14/2018] [Indexed: 11/14/2022] Open
Abstract
Summary Rapid advances in high-throughput sequencing technologies have enabled more efficient acquisition of massive amount of multi-omics data. However, interpretation of the underlying relationships across multi-omics networks has not been fully succeeded, partly due to the lack of effective methods in visualization. To aid interpretation of the results from such multi-omics data, we here present Grimon (Graphical interface to visualize multi-omics networks), an R package that visualizes high-dimensional multi-layered data sets in three-dimensional parallel coordinates. Grimon enables users to intuitively and interactively explore their analyzed data, helping their understanding of multiple inter-layer connections embedded in high-dimensional complex data. Availability and implementation Grimon is freely available at https://github.com/mkanai/grimon as an R package with example omics data sets. Supplementary information Supplementary data are available at bioinformatics online.
Collapse
Affiliation(s)
- Masahiro Kanai
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yuichi Maeda
- Laboratory of Immune Regulation Graduate School of Medicine, Department of Microbiology and Immunology, WPI Immunology Frontier Research Center, Osaka University, Suita, Japan.,Japan Agency for Medical Research and Development-Core Research for Evolutional Science and Technology, Tokyo, Japan.,Department of Respiratory Medicine and Clinical Immunology, Osaka University Graduate School of Medicine, WPI Immunology Frontier Research Center, Osaka University, Suita, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| |
Collapse
|
14
|
Transcriptome variation in human populations and its potential application in forensics. J Appl Genet 2019; 60:319-328. [PMID: 31401728 PMCID: PMC6803616 DOI: 10.1007/s13353-019-00510-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Revised: 07/22/2019] [Accepted: 07/24/2019] [Indexed: 12/04/2022]
Abstract
This review presents the state-of-the-art in the forensic application of genetic methods driven by the research in population transcriptomics. In the first part of the review, the constraints of using classical genomic markers are shortly reviewed. In the second part, the developments in the field of inter-population diversity at the transcriptomic level are presented. Subsequently, a potential of population-specific transcriptomic markers in forensic science applications, including ascertaining population affiliation of human samples and cell mixtures separation, are presented.
Collapse
|
15
|
Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. Sci Data 2019; 6:112. [PMID: 31273215 PMCID: PMC6609777 DOI: 10.1038/s41597-019-0116-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/07/2019] [Indexed: 01/23/2023] Open
Abstract
In biomedical research, lymphoblastoid cell lines (LCLs), often established by in vitro infection of resting B cells with Epstein-Barr virus, are commonly used as surrogates for peripheral blood lymphocytes. Genomic and transcriptomic information on LCLs has been used to study the impact of genetic variation on gene expression in humans. Here we present single-cell RNA sequencing (scRNA-seq) data on GM12878 and GM18502—two LCLs derived from the blood of female donors of European and African ancestry, respectively. Cells from three samples (the two LCLs and a 1:1 mixture of the two) were prepared separately using a 10x Genomics Chromium Controller and deeply sequenced. The final dataset contained 7,045 cells from GM12878, 5,189 from GM18502, and 5,820 from the mixture, offering valuable information on single-cell gene expression in highly homogenous cell populations. This dataset is a suitable reference for population differentiation in gene expression at the single-cell level. Data from the mixture provide additional valuable information facilitating the development of statistical methods for data normalization and batch effect correction. Design Type(s) | transcription profiling design • strain comparison design | Measurement Type(s) | transcription profiling assay | Technology Type(s) | RNA sequencing | Factor Type(s) | ancestry status • sex | Sample Characteristic(s) | GM12878 cell • GM18502 cell • immortal human peripheral vein-derived B cell line cell |
Machine-accessible metadata file describing the reported data (ISA-Tab format)
Collapse
|
16
|
Abstract
Risk of disease is multifactorial and can be shaped by socio-economic, demographic, cultural, environmental and genetic factors. Our understanding of the genetic determinants of disease risk has greatly advanced with the advent of genome-wide association studies (GWAS), which detect associations between genetic variants and complex traits or diseases by comparing populations of cases and controls. However, much of this discovery has occurred through GWAS of individuals of European ancestry, with limited representation of other populations, including from Africa, The Americas, Asia and Oceania. Population demography, genetic drift and adaptation to environments over thousands of years have led globally to the diversification of populations. This global genomic diversity can provide new opportunities for discovery and translation into therapies, as well as a better understanding of population disease risk. Large-scale multi-ethnic and representative biobanks and population health resources provide unprecedented opportunities to understand the genetic determinants of disease on a global scale.
Collapse
|
17
|
Schaefke B, Sun W, Li YS, Fang L, Chen W. The evolution of posttranscriptional regulation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2018; 9:e1485. [PMID: 29851258 DOI: 10.1002/wrna.1485] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/23/2018] [Accepted: 04/26/2018] [Indexed: 12/13/2022]
Abstract
"DNA makes RNA makes protein." After transcription, mRNAs undergo a series of intertwining processes to be finally translated into functional proteins. The "posttranscriptional" regulation (PTR) provides cells an extended option to fine-tune their proteomes. To meet the demands of complex organism development and the appropriate response to environmental stimuli, every step in these processes needs to be finely regulated. Moreover, changes in these regulatory processes are important driving forces underlying the evolution of phenotypic differences across different species. The major PTR mechanisms discussed in this review include the regulation of splicing, polyadenylation, decay, and translation. For alternative splicing and polyadenylation, we mainly discuss their evolutionary dynamics and the genetic changes underlying the regulatory differences in cis-elements versus trans-factors. For mRNA decay and translation, which, together with transcription, determine the cellular RNA or protein abundance, we focus our discussion on how their divergence coordinates with transcriptional changes to shape the evolution of gene expression. Then to highlight the importance of PTR in the evolution of higher complexity, we focus on their roles in two major phenomena during eukaryotic evolution: the evolution of multicellularity and the division of labor between different cell types and tissues; and the emergence of diverse, often highly specialized individual phenotypes, especially those concerning behavior in eusocial insects. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution Translation > Translation Regulation RNA Processing > Splicing Regulation/Alternative Splicing.
Collapse
Affiliation(s)
- Bernhard Schaefke
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Wei Sun
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Department of Pharmaceutical Chemistry and Cardiovascular Research Institute, University of California San Francisco, San Francisco
| | - Yi-Sheng Li
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Liang Fang
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Medi-X Institute, SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Medi-X Institute, SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
18
|
Gokcumen O. The Year In Genetic Anthropology: New Lands, New Technologies, New Questions. AMERICAN ANTHROPOLOGIST 2018. [DOI: 10.1111/aman.13032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Omer Gokcumen
- Department of Biological Sciences University of Buffalo NY 14260 USA
| |
Collapse
|
19
|
Jordan DM, Do R. Using Full Genomic Information to Predict Disease: Breaking Down the Barriers Between Complex and Mendelian Diseases. Annu Rev Genomics Hum Genet 2018; 19:289-301. [PMID: 29641912 DOI: 10.1146/annurev-genom-083117-021136] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information.
Collapse
Affiliation(s)
- Daniel M Jordan
- Charles Bronfman Institute for Personalized Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
| | - Ron Do
- Charles Bronfman Institute for Personalized Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
| |
Collapse
|
20
|
Park DS, Eskin I, Kang EY, Gamazon ER, Eng C, Gignoux CR, Galanter JM, Burchard E, Ye CJ, Aschard H, Eskin E, Halperin E, Zaitlen N. An ancestry-based approach for detecting interactions. Genet Epidemiol 2018; 42:49-63. [PMID: 29114909 PMCID: PMC6065511 DOI: 10.1002/gepi.22087] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 09/06/2017] [Accepted: 09/08/2017] [Indexed: 12/31/2022]
Abstract
BACKGROUND Epistasis and gene-environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene-environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies. RESULTS In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at P<5×10-8. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low P-values (P<1.8×10-6). CONCLUSION We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Danny S. Park
- Department of Bioengineering and Therapeutic Sciences. University of California San Francisco. San Francisco, CA
| | - Itamar Eskin
- The Blavatnik School of Computer Science. Tel-Aviv University. Tel Aviv, Israel
| | - Eun Yong Kang
- Department of Computer Science. University of California Los Angeles. Los Angeles, CA
| | - Eric R. Gamazon
- Division of Genetic Medicine, Department of Medicine. Vanderbilt University. Nashville, TN
- Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Celeste Eng
- Department of Medicine. University of California San Francisco. San Francisco, CA
| | - Christopher R. Gignoux
- Department of Bioengineering and Therapeutic Sciences. University of California San Francisco. San Francisco, CA
- Department of Genetics. Stanford University. Palo Alto, CA
| | - Joshua M. Galanter
- Department of Medicine. University of California San Francisco. San Francisco, CA
| | - Esteban Burchard
- Department of Bioengineering and Therapeutic Sciences. University of California San Francisco. San Francisco, CA
- Department of Medicine. University of California San Francisco. San Francisco, CA
| | - Chun J. Ye
- Institute of Human Genetics. University of California San Francisco. San Francisco, CA
| | - Hugues Aschard
- Department of Epidemiology. Harvard School of Public Health. Boston, MA
| | - Eleazar Eskin
- Department of Computer Science. University of California Los Angeles. Los Angeles, CA
| | - Eran Halperin
- The Blavatnik School of Computer Science. Tel-Aviv University. Tel Aviv, Israel
| | - Noah Zaitlen
- Department of Bioengineering and Therapeutic Sciences. University of California San Francisco. San Francisco, CA
- Department of Medicine. University of California San Francisco. San Francisco, CA
| |
Collapse
|
21
|
Tian L, Khan A, Ning Z, Yuan K, Zhang C, Lou H, Yuan Y, Xu S. Genome-wide comparison of allele-specific gene expression between African and European populations. Hum Mol Genet 2018; 27:1067-1077. [DOI: 10.1093/hmg/ddy027] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/05/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Lei Tian
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Asifullah Khan
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- Department of Biochemistry, Abdul Wali Khan University Mardan, Mardan-23200 KP, Pakistan
| | - Zhilin Ning
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kai Yuan
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Zhang
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haiyi Lou
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
| | - Yuan Yuan
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
| | - Shuhua Xu
- Chinese Academy of Sciences Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, CAS, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
- Collaborative Innovation Center of Genetics and Development, Shanghai 200438, China
| |
Collapse
|
22
|
Yablonovitch AL, Fu J, Li K, Mahato S, Kang L, Rashkovetsky E, Korol AB, Tang H, Michalak P, Zelhof AC, Nevo E, Li JB. Regulation of gene expression and RNA editing in Drosophila adapting to divergent microclimates. Nat Commun 2017; 8:1570. [PMID: 29146998 PMCID: PMC5691062 DOI: 10.1038/s41467-017-01658-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 10/05/2017] [Indexed: 12/30/2022] Open
Abstract
Determining the mechanisms by which a species adapts to its environment is a key endeavor in the study of evolution. In particular, relatively little is known about how transcriptional processes are fine-tuned to adjust to different environmental conditions. Here we study Drosophila melanogaster from 'Evolution Canyon' in Israel, which consists of two opposing slopes with divergent microclimates. We identify several hundred differentially expressed genes and dozens of differentially edited sites between flies from each slope, correlate these changes with genetic differences, and use CRISPR mutagenesis to validate that an intronic SNP in prominin regulates its editing levels. We also demonstrate that while temperature affects editing levels at more sites than genetic differences, genetically regulated sites tend to be less affected by temperature. This work shows the extent to which gene expression and RNA editing differ between flies from different microclimates, and provides insights into the regulation responsible for these differences.
Collapse
Affiliation(s)
- Arielle L Yablonovitch
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA.,Biophysics Program, Stanford University, Stanford, CA, 94305, USA
| | - Jeremy Fu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Kexin Li
- Institute of Evolution, University of Haifa, Haifa, 3498838, Israel.,Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Simpla Mahato
- Department of Biology, Indiana University, Bloomington, IN, 47405, USA
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA
| | | | - Abraham B Korol
- Institute of Evolution, University of Haifa, Haifa, 3498838, Israel
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Pawel Michalak
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA.,Biocomplexity Institute, Virginia Tech, Blacksburg, VA, 24061, USA.,Center for One Health Research, Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, 24060, USA
| | - Andrew C Zelhof
- Department of Biology, Indiana University, Bloomington, IN, 47405, USA
| | - Eviatar Nevo
- Institute of Evolution, University of Haifa, Haifa, 3498838, Israel.
| | - Jin Billy Li
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA. .,Biophysics Program, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
23
|
Worldwide patterns of human epigenetic variation. Nat Ecol Evol 2017; 1:1577-1583. [PMID: 29185505 DOI: 10.1038/s41559-017-0299-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 07/27/2017] [Indexed: 11/08/2022]
Abstract
DNA methylation is an epigenetic modification, influenced by both genetic and environmental variation, that plays a key role in transcriptional regulation and many organismal phenotypes. Although patterns of DNA methylation have been shown to differ between human populations, it remains to be determined how epigenetic diversity relates to the patterns of genetic and gene expression variation at a global scale. Here we measured DNA methylation at 485,000 CpG sites in five diverse human populations, and analysed these data together with genome-wide genotype and gene expression data. We found that population-specific DNA methylation mirrors genetic variation, and has greater local genetic control than mRNA levels. We estimated the rate of epigenetic divergence between populations, which indicates far greater evolutionary stability of DNA methylation in humans than has been observed in plants. This study provides a deeper understanding of worldwide patterns of human epigenetic diversity, as well as initial estimates of the rate of epigenetic divergence in recent human evolution.
Collapse
|
24
|
Kelly DE, Hansen MEB, Tishkoff SA. Global variation in gene expression and the value of diverse sampling. CURRENT OPINION IN SYSTEMS BIOLOGY 2017; 1:102-108. [PMID: 28596996 PMCID: PMC5458633 DOI: 10.1016/j.coisb.2016.12.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The genomics era has accelerated our understanding of how genetic and epigenetic factors influence both normal variable traits and disease risk in humans. However, the majority of "omics" studies have focused on individuals living in urban centers, primarily from Europe and Asia, neglecting much of the genetic and environmental variation that exists across worldwide populations. Comparative studies of gene regulation in ethnically diverse populations are informing our understanding of how evolutionary forces have shaped the genetic and molecular mechanisms underlying complex traits, and studying gene expression in different environmental contexts is enabling the dissection of disease-related pathways such as immune response. Such approaches are vital to the equitable application of genomics and medicine.
Collapse
Affiliation(s)
- Derek E. Kelly
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Sarah A. Tishkoff
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
25
|
Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations. Cell 2016; 167:643-656.e17. [PMID: 27768888 PMCID: PMC5075285 DOI: 10.1016/j.cell.2016.09.024] [Citation(s) in RCA: 251] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 07/14/2016] [Accepted: 09/15/2016] [Indexed: 12/30/2022]
Abstract
Humans differ in the outcome that follows exposure to life-threatening pathogens, yet the extent of population differences in immune responses and their genetic and evolutionary determinants remain undefined. Here, we characterized, using RNA sequencing, the transcriptional response of primary monocytes from Africans and Europeans to bacterial and viral stimuli-ligands activating Toll-like receptor pathways (TLR1/2, TLR4, and TLR7/8) and influenza virus-and mapped expression quantitative trait loci (eQTLs). We identify numerous cis-eQTLs that contribute to the marked differences in immune responses detected within and between populations and a strong trans-eQTL hotspot at TLR1 that decreases expression of pro-inflammatory genes in Europeans only. We find that immune-responsive regulatory variants are enriched in population-specific signals of natural selection and show that admixture with Neandertals introduced regulatory variants into European genomes, affecting preferentially responses to viral challenges. Together, our study uncovers evolutionarily important determinants of differences in host immune responsiveness between human populations.
Collapse
|
26
|
Ancient Out-of-Africa Mitochondrial DNA Variants Associate with Distinct Mitochondrial Gene Expression Patterns. PLoS Genet 2016; 12:e1006407. [PMID: 27812116 PMCID: PMC5094714 DOI: 10.1371/journal.pgen.1006407] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 10/06/2016] [Indexed: 11/19/2022] Open
Abstract
Mitochondrial DNA (mtDNA) variants have been traditionally used as markers to trace ancient population migrations. Although experiments relying on model organisms and cytoplasmic hybrids, as well as disease association studies, have served to underline the functionality of certain mtDNA SNPs, only little is known of the regulatory impact of ancient mtDNA variants, especially in terms of gene expression. By analyzing RNA-seq data of 454 lymphoblast cell lines from the 1000 Genomes Project, we found that mtDNA variants defining the most common African genetic background, the L haplogroup, exhibit a distinct overall mtDNA gene expression pattern, which was independent of mtDNA copy numbers. Secondly, intra-population analysis revealed subtle, yet significant, expression differences in four tRNA genes. Strikingly, the more prominent African mtDNA gene expression pattern best correlated with the expression of nuclear DNA-encoded RNA-binding proteins, and with SNPs within the mitochondrial RNA-binding proteins PTCD1 and MRPS7. Our results thus support the concept of an ancient regulatory transition of mtDNA-encoded genes as humans left Africa to populate the rest of the world. The mitochondrion is an organelle found in all cells of our body and plays a significant role in the energy and heat production. This is the only organelle in animal cells harboring its own genome outside of the nucleus. Mitochondrial DNA (mtDNA) variants have been traditionally used as neutral markers to trace ancient population migrations. As a result, the functional impact of human mtDNA population variants on gene regulation is poorly understood. To address this question, we analyzed available data of mtDNA gene expression pattern in a large group of individuals (454) from diverse human populations. Here, we show for the first time that the ancient migration of humans out of Africa correlated with differences in mitochondrial gene expression patterns, and could be explained by the activity of certain RNA-binding proteins. These findings suggest a major mitochondrial regulatory transition, as humans left Africa to populate the rest of the world.
Collapse
|
27
|
Ancestral Origins and Genetic History of Tibetan Highlanders. Am J Hum Genet 2016; 99:580-594. [PMID: 27569548 DOI: 10.1016/j.ajhg.2016.07.002] [Citation(s) in RCA: 129] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 07/01/2016] [Indexed: 12/30/2022] Open
Abstract
The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection.
Collapse
|
28
|
Oetjens MT, Shen F, Emery SB, Zou Z, Kidd JM. Y-Chromosome Structural Diversity in the Bonobo and Chimpanzee Lineages. Genome Biol Evol 2016; 8:2231-40. [PMID: 27358426 PMCID: PMC4987114 DOI: 10.1093/gbe/evw150] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The male-specific regions of primate Y-chromosomes (MSY) are enriched for multi-copy genes highly expressed in the testis. These genes are located in large repetitive sequences arranged as palindromes, inverted-, and tandem repeats termed amplicons. In humans, these genes have critical roles in male fertility and are essential for the production of sperm. The structure of human and chimpanzee amplicon sequences show remarkable difference relative to the remainder of the genome, a difference that may be the result of intense selective pressure on male fertility. Four subspecies of common chimpanzees have undergone extended periods of isolation and appear to be in the early process of subspeciation. A recent study found amplicons enriched for testis-expressed genes on the primate X-chromosome the target of hard selective sweeps, and male-fertility genes on the Y-chromosome may also be the targets of selection. However, little is understood about Y-chromosome amplicon diversity within and across chimpanzee populations. Here, we analyze nine common chimpanzee (representing three subspecies: Pan troglodytes schweinfurthii, Pan troglodytes ellioti, and Pan troglodytes verus) and two bonobo (Pan paniscus) male whole-genome sequences to assess Y ampliconic copy-number diversity across the Pan genus. We observe that the copy number of Y chromosome amplicons is variable among chimpanzees and bonobos, and identify several lineage-specific patterns, including variable copy number of azoospermia candidates RBMY and DAZ. We detect recurrent switchpoints of copy-number change along the ampliconic tracts across chimpanzee populations, which may be the result of localized genome instability or selective forces.
Collapse
Affiliation(s)
| | - Feichen Shen
- Department of Human Genetics, University of Michigan Medical School
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan Medical School
| | - Zhengting Zou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School Department of Computational Medicine and Bioinformatics, University of Michigan Medical School
| |
Collapse
|
29
|
Discovery of unfixed endogenous retrovirus insertions in diverse human populations. Proc Natl Acad Sci U S A 2016; 113:E2326-34. [PMID: 27001843 DOI: 10.1073/pnas.1602336113] [Citation(s) in RCA: 165] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Endogenous retroviruses (ERVs) have contributed to more than 8% of the human genome. The majority of these elements lack function due to accumulated mutations or internal recombination resulting in a solitary (solo) LTR, although members of one group of human ERVs (HERVs), HERV-K, were recently active with members that remain nearly intact, a subset of which is present as insertionally polymorphic loci that include approximately full-length (2-LTR) and solo-LTR alleles in addition to the unoccupied site. Several 2-LTR insertions have intact reading frames in some or all genes that are expressed as functional proteins. These properties reflect the activity of HERV-K and suggest the existence of additional unique loci within humans. We sought to determine the extent to which other polymorphic insertions are present in humans, using sequenced genomes from the 1000 Genomes Project and a subset of the Human Genome Diversity Project panel. We report analysis of a total of 36 nonreference polymorphic HERV-K proviruses, including 19 newly reported loci, with insertion frequencies ranging from <0.0005 to >0.75 that varied by population. Targeted screening of individual loci identified three new unfixed 2-LTR proviruses within our set, including an intact provirus present at Xq21.33 in some individuals, with the potential for retained infectivity.
Collapse
|
30
|
Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol 2016; 17:14. [PMID: 26821746 PMCID: PMC4731934 DOI: 10.1186/s13059-016-0873-8] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/06/2016] [Indexed: 02/06/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes. Results We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from 10 healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at 1- or more than 1-month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in two independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number. Conclusions A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0873-8) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Wildschutte JH, Baron A, Diroff NM, Kidd JM. Discovery and characterization of Alu repeat sequences via precise local read assembly. Nucleic Acids Res 2015; 43:10292-307. [PMID: 26503250 PMCID: PMC4666360 DOI: 10.1093/nar/gkv1089] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 10/08/2015] [Indexed: 12/03/2022] Open
Abstract
Alu insertions have contributed to >11% of the human genome and ∼30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5′ truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5′ truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms.
Collapse
Affiliation(s)
- Julia H Wildschutte
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Alayna Baron
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Nicolette M Diroff
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
32
|
The human gene damage index as a gene-level approach to prioritizing exome variants. Proc Natl Acad Sci U S A 2015; 112:13615-20. [PMID: 26483451 DOI: 10.1073/pnas.1518646112] [Citation(s) in RCA: 172] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.
Collapse
|
33
|
Sudheesh S, Sawbridge TI, Cogan NO, Kennedy P, Forster JW, Kaur S. De novo assembly and characterisation of the field pea transcriptome using RNA-Seq. BMC Genomics 2015; 16:611. [PMID: 26275991 PMCID: PMC4537571 DOI: 10.1186/s12864-015-1815-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 05/15/2015] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Field pea (Pisum sativum L.) is a cool-season grain legume that is cultivated world-wide for both human consumption and stock-feed purposes. Enhancement of genetic and genomic resources for field pea will permit improved understanding of the control of traits relevant to crop productivity and quality. Advances in second-generation sequencing and associated bioinformatics analysis now provide unprecedented opportunities for the development of such resources. The objective of this study was to perform transcriptome sequencing and characterisation from two genotypes of field pea that differ in terms of seed and plant morphological characteristics. RESULTS Transcriptome sequencing was performed with RNA templates from multiple tissues of the field pea genotypes Kaspa and Parafield. Tissue samples were collected at various growth stages, and a total of 23 cDNA libraries were sequenced using Illumina high-throughput sequencing platforms. A total of 407 and 352 million paired-end reads from the Kaspa and Parafield transcriptomes, respectively were assembled into 129,282 and 149,272 contigs, which were filtered on the basis of known gene annotations, presence of open reading frames (ORFs), reciprocal matches and degree of coverage. Totals of 126,335 contigs from Kaspa and 145,730 from Parafield were subsequently selected as the reference set. Reciprocal sequence analysis revealed that c. 87% of contigs were expressed in both cultivars, while a small proportion were unique to each genotype. Reads from different libraries were aligned to the genotype-specific assemblies in order to identify and characterise expression of contigs on a tissue-specific basis, of which 87% were expressed in more than one tissue, while others showed distinct expression patterns in specific tissues, providing unique transcriptome signatures. CONCLUSION This study provided a comprehensive assembled and annotated transcriptome set for field pea that can be used for development of genetic markers, in order to assess genetic diversity, construct linkage maps, perform trait-dissection and implement whole-genome selection strategies in varietal improvement programs, as well to identify target genes for genetic modification approaches on the basis of annotation and expression analysis. In addition, the reference field pea transcriptome will prove highly valuable for comparative genomics studies and construction of a finalised genome sequence.
Collapse
Affiliation(s)
- Shimna Sudheesh
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia.
| | - Timothy I Sawbridge
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia.
| | - Noel Oi Cogan
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia.
| | - Peter Kennedy
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, Grains Innovation Park, Horsham, VIC, 3401, Australia.
| | - John W Forster
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia.
| | - Sukhjiwan Kaur
- Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia.
| |
Collapse
|
34
|
Lipscombe D, Pan JQ, Schorge S. Tracks through the genome to physiological events. Exp Physiol 2015; 100:1429-40. [PMID: 26053180 PMCID: PMC5008151 DOI: 10.1113/ep085129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 06/02/2015] [Indexed: 12/16/2022]
Abstract
New Findings What is the topic of this review? We discuss tools available to access genome‐wide data sets that harbour cell‐specific, brain region‐specific and tissue‐specific information on exon usage for several species, including humans. In this Review, we demonstrate how to access this information in genome databases and its enormous value to physiology. What advances does it highlight? The sheer scale of protein diversity that is possible from complex genes, including those that encode voltage‐gated ion channels, is vast. But this choice is critical for a complete understanding of protein function in the most physiologically relevant context.
Many proteins of great interest to physiologists and neuroscientists are structurally complex and located in specialized subcellular domains, such as neuronal synapses and transverse tubules of muscle. Genes that encode these critical signalling molecules (receptors, ion channels, transporters, enzymes, cell adhesion molecules, cell–cell interaction proteins and cytoskeletal proteins) are similarly complex. Typically, these genes are large; human Dystrophin (DMD) encodes a cytoskeletal protein of muscle and it is the largest naturally occurring gene at a staggering 2.3 Mb. Large genes contain many non‐coding introns and coding exons; human Titin (TTN), which encodes a protein essential for the assembly and functioning of vertebrate striated muscles, has over 350 exons and consequently has an enormous capacity to generate different forms of Titin mRNAs that have unique exon combinations. Functional and pharmacological differences among protein isoforms originating from the same gene may be subtle but nonetheless of critical physiological significance. Standard functional, immunological and pharmacological approaches, so useful for characterizing proteins encoded by different genes, typically fail to discriminate among splice isoforms of individual genes. Tools are now available to access genome‐wide data sets that harbour cell‐specific, brain region‐specific and tissue‐specific information on exon usage for several species, including humans. In this Review, we demonstrate how to access this information in genome databases and its enormous value to physiology.
Collapse
Affiliation(s)
- Diane Lipscombe
- Department of Neuroscience, Brown University, Providence, RI, USA
| | - Jen Q Pan
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | | |
Collapse
|
35
|
Integrated genomics identifies convergence of ankylosing spondylitis with global immune mediated disease pathways. Sci Rep 2015; 5:10314. [PMID: 25980808 PMCID: PMC4434845 DOI: 10.1038/srep10314] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 04/08/2015] [Indexed: 12/17/2022] Open
Abstract
Ankylosing spondylitis(AS), a highly heritable complex inflammatory arthritis. Although, a handful of non-HLA risk loci have been identified, capturing the unexplained genetic contribution to AS pathogenesis remains a challenge attributed to additive, pleiotropic and epistatic-interactions at the molecular level. Here, we developed multiple integrated genomic approaches to quantify molecular convergence of non-HLA loci with global immune mediated diseases. We show that non-HLA genes are significantly sensitive to deleterious mutation accumulation in the general population compared with tolerant genes. Human developmental proteomics (prenatal to adult) analysis revealed that proteins encoded by non-HLA AS risk loci are 2-fold more expressed in adult hematopoietic cells.Enrichment analysis revealed AS risk genes overlap with a significant number of immune related pathways (p < 0.0001 to 9.8 × 10-12). Protein-protein interaction analysis revealed non-shared AS risk genes are highly clustered seeds that significantly converge (empirical; p < 0.01 to 1.6 × 10-4) into networks of global immune mediated disease risk loci. We have also provided initial evidence for the involvement of STAT2/3 in AS pathogenesis. Collectively, these findings highlight molecular insight on non-HLA AS risk loci that are not exclusively connected with overlapping immune mediated diseases; rather a component of common pathophysiological pathways with other immune mediated diseases. This information will be pivotal to fully explain AS pathogenesis and identify new therapeutic targets.
Collapse
|
36
|
Torres JM, Gamazon ER, Parra EJ, Below JE, Valladares-Salgado A, Wacher N, Cruz M, Hanis CL, Cox NJ. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am J Hum Genet 2014; 95:521-34. [PMID: 25439722 PMCID: PMC4225593 DOI: 10.1016/j.ajhg.2014.10.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 10/01/2014] [Indexed: 01/10/2023] Open
Abstract
Top signals from genome-wide association studies (GWASs) of type 2 diabetes (T2D) are enriched with expression quantitative trait loci (eQTLs) identified in skeletal muscle and adipose tissue. We therefore hypothesized that such eQTLs might account for a disproportionate share of the heritability estimated from all SNPs interrogated through GWASs. To test this hypothesis, we applied linear mixed models to the Wellcome Trust Case Control Consortium (WTCCC) T2D data set and to data sets representing Mexican Americans from Starr County, TX, and Mexicans from Mexico City. We estimated the proportion of phenotypic variance attributable to the additive effect of all variants interrogated in these GWASs, as well as a much smaller set of variants identified as eQTLs in human adipose tissue, skeletal muscle, and lymphoblastoid cell lines. The narrow-sense heritability explained by all interrogated SNPs in each of these data sets was substantially greater than the heritability accounted for by genome-wide-significant SNPs (∼10%); GWAS SNPs explained over 50% of phenotypic variance in the WTCCC, Starr County, and Mexico City data sets. The estimate of heritability attributable to cross-tissue eQTLs was greater in the WTCCC data set and among lean Hispanics, whereas adipose eQTLs significantly explained heritability among Hispanics with a body mass index ≥ 30. These results support an important role for regulatory variants in the genetic component of T2D susceptibility, particularly for eQTLs that elicit effects across insulin-responsive peripheral tissues.
Collapse
Affiliation(s)
- Jason M Torres
- Committee on Molecular Metabolism and Nutrition, University of Chicago, Chicago, IL 60637, USA
| | - Eric R Gamazon
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Esteban J Parra
- Department of Anthropology, University of Toronto at Mississauga, Mississauga, ON L5L 1C6, Canada
| | - Jennifer E Below
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77225, USA
| | - Adan Valladares-Salgado
- Unidades de Investigacion Medica en Bioquimica y Unidad de Epidemiologia Clinica, Hospital de Especialidades, Centro Medico Nacional "Siglo XXI," Instituto Mexicano del Seguro Social, Mexico City, CP 06720, Mexico
| | - Niels Wacher
- Unidades de Investigacion Medica en Bioquimica y Unidad de Epidemiologia Clinica, Hospital de Especialidades, Centro Medico Nacional "Siglo XXI," Instituto Mexicano del Seguro Social, Mexico City, CP 06720, Mexico
| | - Miguel Cruz
- Unidades de Investigacion Medica en Bioquimica y Unidad de Epidemiologia Clinica, Hospital de Especialidades, Centro Medico Nacional "Siglo XXI," Instituto Mexicano del Seguro Social, Mexico City, CP 06720, Mexico
| | - Craig L Hanis
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77225, USA
| | - Nancy J Cox
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
37
|
Dayama G, Emery SB, Kidd JM, Mills RE. The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Res 2014; 42:12640-9. [PMID: 25348406 PMCID: PMC4227756 DOI: 10.1093/nar/gku1038] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The transfer of mitochondrial genetic material into the nuclear genomes of eukaryotes is a well-established phenomenon that has been previously limited to the study of static reference genomes. The recent advancement of high throughput sequencing has enabled an expanded exploration into the diversity of polymorphic nuclear mitochondrial insertions (NumtS) within human populations. We have developed an approach to discover and genotype novel Numt insertions using whole genome, paired-end sequencing data. We have applied this method to a thousand individuals in 20 populations from the 1000 Genomes Project and other datasets and identified 141 new sites of Numt insertions, extending our current knowledge of existing NumtS by almost 20%. We find that recent Numt insertions are derived from throughout the mitochondrial genome, including the D-loop, and have integration biases that differ in some respects from previous studies on older, fixed NumtS in the reference genome. We determined the complete inserted sequence for a subset of these events and have identified a number of nearly full-length mitochondrial genome insertions into nuclear chromosomes. We further define their age and origin of insertion and present an analysis of their potential impact to ongoing studies of mitochondrial heteroplasmy and disease.
Collapse
Affiliation(s)
- Gargi Dayama
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jeffrey M Kidd
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|