1
|
Kernohan KD, Boycott KM. The expanding diagnostic toolbox for rare genetic diseases. Nat Rev Genet 2024; 25:401-415. [PMID: 38238519 DOI: 10.1038/s41576-023-00683-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2023] [Indexed: 05/23/2024]
Abstract
Genomic technologies, such as targeted, exome and short-read genome sequencing approaches, have revolutionized the care of patients with rare genetic diseases. However, more than half of patients remain without a diagnosis. Emerging approaches from research-based settings such as long-read genome sequencing and optical genome mapping hold promise for improving the identification of disease-causal genetic variants. In addition, new omic technologies that measure the transcriptome, epigenome, proteome or metabolome are showing great potential for variant interpretation. As genetic testing options rapidly expand, the clinical community needs to be mindful of their individual strengths and limitations, as well as remaining challenges, to select the appropriate diagnostic test, correctly interpret results and drive innovation to address insufficiencies. If used effectively - through truly integrative multi-omics approaches and data sharing - the resulting large quantities of data from these established and emerging technologies will greatly improve the interpretative power of genetic and genomic diagnostics for rare diseases.
Collapse
Affiliation(s)
- Kristin D Kernohan
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada
- Newborn Screening Ontario, CHEO, Ottawa, ON, Canada
| | - Kym M Boycott
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada.
- Department of Genetics, CHEO, Ottawa, ON, Canada.
| |
Collapse
|
2
|
Stoneman HR, Price A, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577805. [PMID: 38766180 PMCID: PMC11100604 DOI: 10.1101/2024.01.29.577805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Genetic summary data are broadly accessible and highly useful including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into groups masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted substructure limits summary data usability, especially for understudied or admixed populations. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to estimate and adjust for substructure in genetic summary data. In extensive simulations and application to public data, Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and identifies potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse publicly available summary data resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| |
Collapse
|
3
|
Artomov M, Loboda AA, Artyomov MN, Daly MJ. Public platform with 39,472 exome control samples enables association studies without genotype sharing. Nat Genet 2024; 56:327-335. [PMID: 38200129 PMCID: PMC10864173 DOI: 10.1038/s41588-023-01637-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 12/01/2023] [Indexed: 01/12/2024]
Abstract
Acquiring a sufficiently powered cohort of control samples matched to a case sample can be time-consuming or, in some cases, impossible. Accordingly, an ability to leverage genetic data from control samples that were already collected elsewhere could dramatically improve power in genetic association studies. Sharing of control samples can pose significant challenges, since most human genetic data are subject to strict sharing regulations. Here, using the properties of singular value decomposition and subsampling algorithm, we developed a method allowing selection of the best-matching controls in an external pool of samples compliant with personal data protection and eliminating the need for genotype sharing. We provide access to a library of 39,472 exome sequencing controls at http://dnascore.net enabling association studies for case cohorts lacking control subjects. Using this approach, control sets can be selected from this online library with a prespecified matching accuracy, ensuring well-calibrated association analysis for both rare and common variants.
Collapse
Affiliation(s)
- Mykyta Artomov
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
| | - Alexander A Loboda
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
- ITMO University, St. Petersburg, Russia
- Almazov National Medical Research Center, St. Petersburg, Russia
| | - Maxim N Artyomov
- Department of Immunology and Pathology, Washington University in St. Louis, St. Louis, MO, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
- Institute for Molecular Medicine Finland, Helsinki, Finland.
| |
Collapse
|
4
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
5
|
Katki HA, Berndt SI, Machiela MJ, Stewart DR, Garcia-Closas M, Kim J, Shi J, Yu K, Rothman N. Increase in power by obtaining 10 or more controls per case when type-1 error is small in large-scale association studies. BMC Med Res Methodol 2023; 23:153. [PMID: 37386403 PMCID: PMC10308790 DOI: 10.1186/s12874-023-01973-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/10/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND The rule of thumb that there is little gain in statistical power by obtaining more than 4 controls per case, is based on type-1 error α = 0.05. However, association studies that evaluate thousands or millions of associations use smaller α and may have access to plentiful controls. We investigate power gains, and reductions in p-values, when increasing well beyond 4 controls per case, for small α. METHODS We calculate the power, the median expected p-value, and the minimum detectable odds-ratio (OR), as a function of the number of controls/case, as α decreases. RESULTS As α decreases, at each ratio of controls per case, the increase in power is larger than for α = 0.05. For α between 10-6 and 10-9 (typical for thousands or millions of associations), increasing from 4 controls per case to 10-50 controls per case increases power. For example, a study with power = 0.2 (α = 5 × 10-8) with 1 control/case has power = 0.65 with 4 controls/case, but with 10 controls/case has power = 0.78, and with 50 controls/case has power = 0.84. For situations where obtaining more than 4 controls per case provides small increases in power beyond 0.9 (at small α), the expected p-value can decrease by orders-of-magnitude below α. Increasing from 1 to 4 controls/case reduces the minimum detectable OR toward the null by 20.9%, and from 4 to 50 controls/case reduces by an additional 9.7%, a result which applies regardless of α and hence also applies to "regular" α = 0.05 epidemiology. CONCLUSIONS At small α, versus 4 controls/case, recruiting 10 or more controls/cases can increase power, reduce the expected p-value by 1-2 orders of magnitude, and meaningfully reduce the minimum detectable OR. These benefits of increasing the controls/case ratio increase as the number of cases increases, although the amount of benefit depends on exposure frequencies and true OR. Provided that controls are comparable to cases, our findings suggest greater sharing of comparable controls in large-scale association studies.
Collapse
Affiliation(s)
- Hormuzd A Katki
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Douglas R Stewart
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jung Kim
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
6
|
Matalon DR, Zepeda-Mendoza CJ, Aarabi M, Brown K, Fullerton SM, Kaur S, Quintero-Rivera F, Vatta M. Clinical, technical, and environmental biases influencing equitable access to clinical genetics/genomics testing: A points to consider statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100812. [PMID: 37058144 DOI: 10.1016/j.gim.2023.100812] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/07/2023] [Indexed: 04/15/2023] Open
Affiliation(s)
- Dena R Matalon
- Division of Medical Genetics, Department of Pediatrics, Stanford Medicine, Stanford University, Stanford, CA
| | - Cinthya J Zepeda-Mendoza
- Divisions of Hematopathology and Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | - Mahmoud Aarabi
- UPMC Medical Genetics and Genomics Laboratories, UPMC Magee-Womens Hospital, Pittsburgh, PA; Departments of Pathology and Obstetrics, Gynecology and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | | | - Stephanie M Fullerton
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA; Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA
| | - Shagun Kaur
- Department of Child Health, Phoenix Children's Hospital, University of Arizona College of Medicine-Phoenix, Phoenix, AZ
| | - Fabiola Quintero-Rivera
- Division of Genetic and Genomic Medicine, Departments of Pathology, Laboratory Medicine, and Pediatrics, University of California Irvine, Irvine, CA
| | | |
Collapse
|
7
|
López-López D, Roldán G, Fernández-Rueda JL, Bostelmann G, Carmona R, Aquino V, Perez-Florido J, Ortuño F, Pita G, Núñez-Torres R, González-Neira A, Peña-Chilet M, Dopazo J. A crowdsourcing database for the copy-number variation of the Spanish population. Hum Genomics 2023; 17:20. [PMID: 36894999 PMCID: PMC9997023 DOI: 10.1186/s40246-023-00466-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 02/25/2023] [Indexed: 03/11/2023] Open
Abstract
BACKGROUND Despite being a very common type of genetic variation, the distribution of copy-number variations (CNVs) in the population is still poorly understood. The knowledge of the genetic variability, especially at the level of the local population, is a critical factor for distinguishing pathogenic from non-pathogenic variation in the discovery of new disease variants. RESULTS Here, we present the SPAnish Copy Number Alterations Collaborative Server (SPACNACS), which currently contains copy number variation profiles obtained from more than 400 genomes and exomes of unrelated Spanish individuals. By means of a collaborative crowdsourcing effort whole genome and whole exome sequencing data, produced by local genomic projects and for other purposes, is continuously collected. Once checked both, the Spanish ancestry and the lack of kinship with other individuals in the SPACNACS, the CNVs are inferred for these sequences and they are used to populate the database. A web interface allows querying the database with different filters that include ICD10 upper categories. This allows discarding samples from the disease under study and obtaining pseudo-control CNV profiles from the local population. We also show here additional studies on the local impact of CNVs in some phenotypes and on pharmacogenomic variants. SPACNACS can be accessed at: http://csvs.clinbioinfosspa.es/spacnacs/ . CONCLUSION SPACNACS facilitates disease gene discovery by providing detailed information of the local variability of the population and exemplifies how to reuse genomic data produced for other purposes to build a local reference database.
Collapse
Affiliation(s)
- Daniel López-López
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.,Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, Seville, Spain.,Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Gema Roldán
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain
| | - Jose L Fernández-Rueda
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain
| | - Gerrit Bostelmann
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain
| | - Rosario Carmona
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.,Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Virginia Aquino
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain
| | - Javier Perez-Florido
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.,Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, Seville, Spain
| | - Francisco Ortuño
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.,Department of Computer Architecture and Computer Technology, University of Granada, 18071, Granada, Spain
| | - Guillermo Pita
- Human Genotyping Unit-CeGen, Spanish National Cancer Research Centre (CNIO), 28029, Madrid, Spain
| | - Rocío Núñez-Torres
- Human Genotyping Unit-CeGen, Spanish National Cancer Research Centre (CNIO), 28029, Madrid, Spain
| | - Anna González-Neira
- Human Genotyping Unit-CeGen, Spanish National Cancer Research Centre (CNIO), 28029, Madrid, Spain
| | | | - María Peña-Chilet
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.,Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, Seville, Spain.,Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Joaquin Dopazo
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain. .,Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, Seville, Spain. .,Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain. .,FPS/ELIXIR-ES, Andalusian Public Foundation Progress and Health-FPS, 41013, Seville, Spain.
| |
Collapse
|
8
|
Duchen D, Vergara C, Thio CL, Kundu P, Chatterjee N, Thomas DL, Wojcik GL, Duggal P. Pathogen exposure misclassification can bias association signals in GWAS of infectious diseases when using population-based common control subjects. Am J Hum Genet 2023; 110:336-348. [PMID: 36649706 PMCID: PMC9943744 DOI: 10.1016/j.ajhg.2022.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 12/20/2022] [Indexed: 01/18/2023] Open
Abstract
Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
Collapse
Affiliation(s)
- Dylan Duchen
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Candelaria Vergara
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Chloe L Thio
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Prosenjit Kundu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - David L Thomas
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
| |
Collapse
|
9
|
Mew M, Caldwell KA, Caldwell GA. From bugs to bedside: functional annotation of human genetic variation for neurological disorders using invertebrate models. Hum Mol Genet 2022; 31:R37-R46. [PMID: 35994032 PMCID: PMC9585664 DOI: 10.1093/hmg/ddac203] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/11/2022] [Accepted: 08/17/2022] [Indexed: 02/02/2023] Open
Abstract
The exponential accumulation of DNA sequencing data has opened new avenues for discovering the causative roles of single-nucleotide polymorphisms (SNPs) in neurological diseases. The opportunities emerging from this are staggering, yet only as good as our abilities to glean insights from this surplus of information. Whereas computational biology continues to improve with respect to predictions and molecular modeling, the differences between in silico and in vivo analysis remain substantial. Invertebrate in vivo model systems represent technically advanced, experimentally mature, high-throughput, efficient and cost-effective resources for investigating a disease. With a decades-long track record of enabling investigators to discern function from DNA, fly (Drosophila) and worm (Caenorhabditis elegans) models have never been better poised to serve as living engines of discovery. Both of these animals have already proven useful in the classification of genetic variants as either pathogenic or benign across a range of neurodevelopmental and neurodegenerative disorders-including autism spectrum disorders, ciliopathies, amyotrophic lateral sclerosis, Alzheimer's and Parkinson's disease. Pathogenic SNPs typically display distinctive phenotypes in functional assays when compared with null alleles and frequently lead to protein products with gain-of-function or partial loss-of-function properties that contribute to neurological disease pathogenesis. The utility of invertebrates is logically limited by overt differences in anatomical and physiological characteristics, and also the evolutionary distance in genome structure. Nevertheless, functional annotation of disease-SNPs using invertebrate models can expedite the process of assigning cellular and organismal consequences to mutations, ascertain insights into mechanisms of action, and accelerate therapeutic target discovery and drug development for neurological conditions.
Collapse
Affiliation(s)
- Melanie Mew
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA
| | - Kim A Caldwell
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA
- Alabama Research Institute on Aging, The University of Alabama, Tuscaloosa, AL 35487, USA
- Center for Convergent Bioscience and Medicine, The University of Alabama, Tuscaloosa, AL 35487, USA
- Departments of Neurobiology and Neurology, Center for Neurodegeneration and Experimental Therapeutics, Nathan Shock Center of Excellence for Research in the Basic Biology of Aging, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Guy A Caldwell
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA
- Center for Convergent Bioscience and Medicine, The University of Alabama, Tuscaloosa, AL 35487, USA
- Departments of Neurobiology and Neurology, Center for Neurodegeneration and Experimental Therapeutics, Nathan Shock Center of Excellence for Research in the Basic Biology of Aging, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
10
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
11
|
Wang Q, Qin T, Tan H, Ding X, Lin X, Li J, Lin Z, Sun L, Lin H, Chen W. Broadening the genotypic and phenotypic spectrum of MAF in three Chinese Han congenital cataracts families. Am J Med Genet A 2022; 188:2888-2898. [PMID: 36097645 DOI: 10.1002/ajmg.a.62947] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/06/2022] [Accepted: 06/19/2022] [Indexed: 01/31/2023]
Abstract
Pathogenic variants in the v-maf avian musculoaponeurotic fibrosarcoma oncogene homologue (MAF) encoding a transcription factor (from a unique subclass of basic leucine zipper transcription factors) are associated with isolated congenital cataracts (CCs) and Aymé-Gripp syndrome (AYGRPS). We collected detailed disease histories from, and performed comprehensive ophthalmic and systemic examinations in 269 patients with CCs; we then performed whole-exome sequencing. Pathogenicity assessments were evaluated using multiple predictive tools. The clinical validities of the reported gene-disease relationships for MAF genes (MAF-CCs and MAF-AYGRPS) were assessed using the ClinGen gene curation framework. We identified two novel (c.173C>A, p.Thr58Asn and c.947T>C, p. Leu316Pro) variants and one known (c.173C>T, p.Thr58Ile) MAF missense variant in three patients. We described novel phenotypes including cleft palate, macular hypoplasia, and retinal neovascularization in the peripheral avascular area and analyzed the genotype-phenotype correlations. We demonstrated associations of variants in the MAF C-terminal DNA-binding domain with CCs and associations of variants in the N-terminal transactivation domain of MAF with AYGRPS. We thus expand the genotypic and phenotypic spectrum of the MAF gene. The ClinGen gene curation framework results suggested that variants in different domains of MAF are associated with different diseases.
Collapse
Affiliation(s)
- Qiwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Tingfeng Qin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | | | - Xiaoyan Ding
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Xiaoshan Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Jing Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Zhuolin Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Limei Sun
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| | - Weirong Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Centre for Ocular Diseases, Guangzhou, China
| |
Collapse
|