51
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Bhat V, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 2024; 56:925-937. [PMID: 38658794 PMCID: PMC11669423 DOI: 10.1038/s41588-024-01726-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Martin Jankowiak
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Vineel Bhat
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, Quebec, Canada
- Faculté de Médecine, Université de Montréal, Montréal, Quebec, Canada
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
52
|
Wei A, Border R, Fu B, Cullina S, Brandes N, Jang SK, Sankararaman S, Kenny E, Udler MS, Ntranos V, Zaitlen N, Arboleda V. Investigating the sources of variable impact of pathogenic variants in monogenic metabolic conditions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.09.14.23295564. [PMID: 37745486 PMCID: PMC10516069 DOI: 10.1101/2023.09.14.23295564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Over three percent of people carry a dominant pathogenic variant, yet only a fraction of carriers develop disease. Disease phenotypes from carriers of variants in the same gene range from mild to severe. Here, we investigate underlying mechanisms for this heterogeneity: variable variant effect sizes, carrier polygenic backgrounds, and modulation of carrier effect by genetic background (marginal epistasis). We leveraged exomes and clinical phenotypes from the UK Biobank and the Mt. Sinai BioMe Biobank to identify carriers of pathogenic variants affecting cardiometabolic traits. We employed recently developed methods to study these cohorts, observing strong statistical support and clinical translational potential for all three mechanisms of variable carrier penetrance and disease severity. For example, scores from our recent model of variant pathogenicity were tightly correlated with phenotype amongst clinical variant carriers, they predicted effects of variants of unknown significance, and they distinguished gain- from loss-of-function variants. We also found that polygenic scores predicted phenotypes amongst pathogenic carriers and that epistatic effects can exceed main carrier effects by an order of magnitude.
Collapse
|
53
|
Livesey BJ, Badonyi M, Dias M, Frazer J, Kumar S, Lindorff-Larsen K, McCandlish DM, Orenbuch R, Shearer CA, Muffley L, Foreman J, Glazer AM, Lehner B, Marks DS, Roth FP, Rubin AF, Starita LM, Marsh JA. Guidelines for releasing a variant effect predictor. ARXIV 2024:arXiv:2404.10807v1. [PMID: 38699161 PMCID: PMC11065047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
Collapse
Affiliation(s)
- Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mafalda Dias
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jonathan Frazer
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sushant Kumar
- Department of Medical Biophysics, University of Toronto; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rose Orenbuch
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Lara Muffley
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Ben Lehner
- Wellcome Sanger Institute, Cambridge, UK; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Frederick P. Roth
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Alan F. Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research; Department of Medical Biology, University of Melbourne, Parkville, Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
54
|
Rivas-González I, Tung J. A multi-million-year natural experiment: Comparative genomics on a massive scale and its implications for human health. Evol Med Public Health 2024; 12:67-70. [PMID: 38601345 PMCID: PMC11005778 DOI: 10.1093/emph/eoae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Improving the diversity and quality of genome assemblies for non-human mammals has been a long-standing goal of comparative genomics. The last year saw substantial progress towards this goal, including the release of genome alignments for 240 mammals and nearly half the primate order. These resources have increased our ability to identify evolutionarily constrained regions of the genome, and together strongly support the importance of these regions to biomedically relevant trait variation in humans. They also provide new strategies for identifying the genetic basis of changes unique to individual lineages, illustrating the value of evolutionary comparative approaches for understanding human health.
Collapse
Affiliation(s)
- Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Department of Biology, Duke University, Durham, NC, USA
- Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
| |
Collapse
|
55
|
Kanthaswamy S. Review: Wildlife forensic genetics-Biological evidence, DNA markers, analytical approaches, and challenges. Anim Genet 2024; 55:177-192. [PMID: 38123142 DOI: 10.1111/age.13390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 12/02/2023] [Accepted: 12/03/2023] [Indexed: 12/23/2023]
Abstract
Wildlife-related crimes are the second most prevalent lawbreaking offense globally. This illicit trade encompasses hunting, breeding and trafficking. Besides diminishing many species and their habitats and ecosystems, hindering the economic development of local communities that depend on them, undermining the rule of law and financing terrorism, various cross-species transmissions (zoonoses) of pathogens, including COVID-19, can be attributed to wildlife crimes. Wildlife forensics applies interdisciplinary scientific analyses to support law enforcement in investigating wildlife crimes. Its main objectives are to identify the taxonomic species in question, determine if a crime has been committed, link a suspect to the crime and support the conviction and prosecution of the perpetrator. This article reviews wildlife crime and its implications, wildlife forensic science investigation, common forms of wildlife biological evidence, including DNA, wildlife DNA techniques and challenges in wildlife forensic genetics. The article also reviews the contributions of genetic markers such as short tandem repeat (STR) and mitochondrial DNA (mtDNA) markers, which provide the probative genetic data representing the bulk of DNA evidence for solving wildlife crime. This review provides an overview of wildlife DNA databases, which are critical for searching and matching forensic DNA profiles and sequences and establishing how frequent forensic DNA profiles and sequences are in a particular population or geographic region. As such, this review will contain an in-depth analysis of the current status of wildlife forensic genetics, and it will be of general interest to wildlife and conservation biologists, law enforcement officers, and academics interested in combating crimes against wildlife using animal forensic DNA methods.
Collapse
Affiliation(s)
- Sree Kanthaswamy
- School of Interdisciplinary Forensics, Arizona State University, Tempe, Arizona, USA
- California National Primate Research Center, University of California, Davis, California, USA
| |
Collapse
|
56
|
Kumar P, Sankaranarayanan R. When Paul Berg meets Donald Crothers: an achiral connection through protein biosynthesis. Nucleic Acids Res 2024; 52:2130-2141. [PMID: 38407292 PMCID: PMC10954443 DOI: 10.1093/nar/gkae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/02/2024] [Accepted: 02/09/2024] [Indexed: 02/27/2024] Open
Abstract
Outliers in scientific observations are often ignored and mostly remain unreported. However, presenting them is always beneficial since they could reflect the actual anomalies that might open new avenues. Here, we describe two examples of the above that came out of the laboratories of two of the pioneers of nucleic acid research in the area of protein biosynthesis, Paul Berg and Donald Crothers. Their work on the identification of D-aminoacyl-tRNA deacylase (DTD) and 'Discriminator hypothesis', respectively, were hugely ahead of their time and were partly against the general paradigm at that time. In both of the above works, the smallest and the only achiral amino acid turned out to be an outlier as DTD can act weakly on glycine charged tRNAs with a unique discriminator base of 'Uracil'. This peculiar nature of glycine remained an enigma for nearly half a century. With a load of available information on the subject by the turn of the century, our work on 'chiral proofreading' mechanisms during protein biosynthesis serendipitously led us to revisit these findings. Here, we describe how we uncovered an unexpected connection between them that has implications for evolution of different eukaryotic life forms.
Collapse
Affiliation(s)
- Pradeep Kumar
- CSIR–Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500007, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad- 201002, India
| | - Rajan Sankaranarayanan
- CSIR–Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500007, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad- 201002, India
| |
Collapse
|
57
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.10.579721. [PMID: 38496530 PMCID: PMC10942266 DOI: 10.1101/2024.02.10.579721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
|
58
|
Saez-Matia A, Ibarluzea MG, M-Alicante S, Muguruza-Montero A, Nuñez E, Ramis R, Ballesteros OR, Lasa-Goicuria D, Fons C, Gallego M, Casis O, Leonardo A, Bergara A, Villarroel A. MLe-KCNQ2: An Artificial Intelligence Model for the Prognosis of Missense KCNQ2 Gene Variants. Int J Mol Sci 2024; 25:2910. [PMID: 38474157 DOI: 10.3390/ijms25052910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024] Open
Abstract
Despite the increasing availability of genomic data and enhanced data analysis procedures, predicting the severity of associated diseases remains elusive in the absence of clinical descriptors. To address this challenge, we have focused on the KV7.2 voltage-gated potassium channel gene (KCNQ2), known for its link to developmental delays and various epilepsies, including self-limited benign familial neonatal epilepsy and epileptic encephalopathy. Genome-wide tools often exhibit a tendency to overestimate deleterious mutations, frequently overlooking tolerated variants, and lack the capacity to discriminate variant severity. This study introduces a novel approach by evaluating multiple machine learning (ML) protocols and descriptors. The combination of genomic information with a novel Variant Frequency Index (VFI) builds a robust foundation for constructing reliable gene-specific ML models. The ensemble model, MLe-KCNQ2, formed through logistic regression, support vector machine, random forest and gradient boosting algorithms, achieves specificity and sensitivity values surpassing 0.95 (AUC-ROC > 0.98). The ensemble MLe-KCNQ2 model also categorizes pathogenic mutations as benign or severe, with an area under the receiver operating characteristic curve (AUC-ROC) above 0.67. This study not only presents a transferable methodology for accurately classifying KCNQ2 missense variants, but also provides valuable insights for clinical counseling and aids in the determination of variant severity. The research context emphasizes the necessity of precise variant classification, especially for genes like KCNQ2, contributing to the broader understanding of gene-specific challenges in the field of genomic research. The MLe-KCNQ2 model stands as a promising tool for enhancing clinical decision making and prognosis in the realm of KCNQ2-related pathologies.
Collapse
Affiliation(s)
| | - Markel G Ibarluzea
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
- Donostia International Physics Center, 20018 Donostia, Spain
| | - Sara M-Alicante
- Instituto Biofisika, CSIC-UPV/EHU, 48940 Leioa, Spain
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
| | | | - Eider Nuñez
- Instituto Biofisika, CSIC-UPV/EHU, 48940 Leioa, Spain
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
| | - Rafael Ramis
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
- Donostia International Physics Center, 20018 Donostia, Spain
| | - Oscar R Ballesteros
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
- Centro de Física de Materiales CFM, CSIC-UPV/EHU, 20018 Donostia, Spain
| | | | - Carmen Fons
- Pediatric Neurology Department, Sant Joan de Déu Hospital, Institut de Recerca Sant Joan de Déu, Barcelona University, 08950 Barcelona, Spain
| | - Mónica Gallego
- Departamento de Fisiología, Universidad del País Vasco, UPV/EHU, 01006 Vitoria-Gasteiz, Spain
| | - Oscar Casis
- Departamento de Fisiología, Universidad del País Vasco, UPV/EHU, 01006 Vitoria-Gasteiz, Spain
| | - Aritz Leonardo
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
- Donostia International Physics Center, 20018 Donostia, Spain
| | - Aitor Bergara
- Physics Department, Universidad del País Vasco, UPV/EHU, 48940 Leioa, Spain
- Donostia International Physics Center, 20018 Donostia, Spain
- Centro de Física de Materiales CFM, CSIC-UPV/EHU, 20018 Donostia, Spain
| | | |
Collapse
|
59
|
Meller A, Kelly D, Smith LG, Bowman GR. Toward physics-based precision medicine: Exploiting protein dynamics to design new therapeutics and interpret variants. Protein Sci 2024; 33:e4902. [PMID: 38358129 PMCID: PMC10868452 DOI: 10.1002/pro.4902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/01/2023] [Accepted: 01/04/2024] [Indexed: 02/16/2024]
Abstract
The goal of precision medicine is to utilize our knowledge of the molecular causes of disease to better diagnose and treat patients. However, there is a substantial mismatch between the small number of food and drug administration (FDA)-approved drugs and annotated coding variants compared to the needs of precision medicine. This review introduces the concept of physics-based precision medicine, a scalable framework that promises to improve our understanding of sequence-function relationships and accelerate drug discovery. We show that accounting for the ensemble of structures a protein adopts in solution with computer simulations overcomes many of the limitations imposed by assuming a single protein structure. We highlight studies of protein dynamics and recent methods for the analysis of structural ensembles. These studies demonstrate that differences in conformational distributions predict functional differences within protein families and between variants. Thanks to new computational tools that are providing unprecedented access to protein structural ensembles, this insight may enable accurate predictions of variant pathogenicity for entire libraries of variants. We further show that explicitly accounting for protein ensembles, with methods like alchemical free energy calculations or docking to Markov state models, can uncover novel lead compounds. To conclude, we demonstrate that cryptic pockets, or cavities absent in experimental structures, provide an avenue to target proteins that are currently considered undruggable. Taken together, our review provides a roadmap for the field of protein science to accelerate precision medicine.
Collapse
Affiliation(s)
- Artur Meller
- Department of Biochemistry and Molecular BiophysicsWashington University in St. LouisSt. LouisMissouriUSA
- Medical Scientist Training ProgramWashington University in St. LouisSt. LouisMissouriUSA
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Devin Kelly
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Louis G. Smith
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Gregory R. Bowman
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
60
|
Calame DG, Wong JH, Panda P, Nguyen DT, Leong NC, Sangermano R, Patankar SG, Abdel-Hamid M, AlAbdi L, Safwat S, Flannery KP, Dardas Z, Fatih JM, Murali C, Kannan V, Lotze TE, Herman I, Ammouri F, Rezich B, Efthymiou S, Alavi S, Murphy D, Firoozfar Z, Nasab ME, Bahreini A, Ghasemi M, Haridy NA, Goldouzi HR, Eghbal F, Karimiani EG, Srinivasan VM, Gowda VK, Du H, Jhangiani SN, Coban-Akdemir Z, Marafi D, Rodan L, Isikay S, Rosenfeld JA, Ramanathan S, Staton M, Kerby C. Oberg, Clark RD, Wenman C, Loughlin S, Saad R, Ashraf T, Male A, Tadros S, Boostani R, Abdel-Salam GM, Zaki M, Abdalla E, Manzini MC, Pehlivan D, Posey JE, Gibbs RA, Houlden H, Alkuraya FS, Bujakowska K, Maroofian R, Lupski JR, Nguyen LN. Biallelic variation in the choline and ethanolamine transporter FLVCR1 underlies a pleiotropic disease spectrum from adult neurodegeneration to severe developmental disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.09.24302464. [PMID: 38405817 PMCID: PMC10888986 DOI: 10.1101/2024.02.09.24302464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
FLVCR1 encodes Feline leukemia virus subgroup C receptor 1 (FLVCR1), a solute carrier (SLC) transporter within the Major Facilitator Superfamily. FLVCR1 is a widely expressed transmembrane protein with plasma membrane and mitochondrial isoforms implicated in heme, choline, and ethanolamine transport. While Flvcr1 knockout mice die in utero with skeletal malformations and defective erythropoiesis reminiscent of Diamond-Blackfan anemia, rare biallelic pathogenic FLVCR1 variants are linked to childhood or adult-onset neurodegeneration of the retina, spinal cord, and peripheral nervous system. We ascertained from research and clinical exome sequencing 27 individuals from 20 unrelated families with biallelic ultra-rare missense and predicted loss-of-function (pLoF) FLVCR1 variant alleles. We characterize an expansive FLVCR1 phenotypic spectrum ranging from adult-onset retinitis pigmentosa to severe developmental disorders with microcephaly, reduced brain volume, epilepsy, spasticity, and premature death. The most severely affected individuals, including three individuals with homozygous pLoF variants, share traits with Flvcr1 knockout mice and Diamond-Blackfan anemia including macrocytic anemia and congenital skeletal malformations. Pathogenic FLVCR1 missense variants primarily lie within transmembrane domains and reduce choline and ethanolamine transport activity compared with wild-type FLVCR1 with minimal impact on FLVCR1 stability or subcellular localization. Several variants disrupt splicing in a mini-gene assay which may contribute to genotype-phenotype correlations. Taken together, these data support an allele-specific gene dosage model in which phenotypic severity reflects residual FLVCR1 activity. This study expands our understanding of Mendelian disorders of choline and ethanolamine transport and demonstrates the importance of choline and ethanolamine in neurodevelopment and neuronal homeostasis.
Collapse
Affiliation(s)
- Daniel G. Calame
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jovi Huixin Wong
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228
| | - Puravi Panda
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228
| | - Dat Tuan Nguyen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228
| | - Nancy C.P. Leong
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228
| | - Riccardo Sangermano
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Sohil G. Patankar
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Mohamed Abdel-Hamid
- Medical Molecular Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, Egypt
| | - Lama AlAbdi
- Department of Zoology, College of Science, King Saud University, Riyadh, Saudi Arabia
- Department of Translational Genomics, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Sylvia Safwat
- Department of Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Kyle P. Flannery
- Department of Neuroscience and Cell Biology, Rutgers-Robert Wood Johnson Medical School, Child Health Institute of New Jersey, NY, USA
| | - Zain Dardas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jawid M. Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Chaya Murali
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Varun Kannan
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Timothy E. Lotze
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Isabella Herman
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Boys Town National Research Hospital, Boys Town, NE, USA
| | - Farah Ammouri
- Boys Town National Research Hospital, Boys Town, NE, USA
- The University of Kansas Health System, Westwood, KS, USA
| | - Brianna Rezich
- Munroe-Meyer Institute for Genetics and Rehabilitation, University of Nebraska Medical Center, Omaha, NE, USA
| | - Stephanie Efthymiou
- Department of Neuromuscular diseases, UCL Institute of Neurology, WC1N 3BG, London, UK
| | - Shahryar Alavi
- Department of Neuromuscular diseases, UCL Institute of Neurology, WC1N 3BG, London, UK
| | - David Murphy
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, United Kingdom
| | | | | | - Amir Bahreini
- KaryoGen, Isfahan, Iran
- Department of Human Genetics, University of Pittsburgh, PA, USA
| | - Majid Ghasemi
- Department of Neurology, Isfahan University of Medical Sciences, Isfahan, Iran
| | | | - Hamid Reza Goldouzi
- Department of Pediatrics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fatemeh Eghbal
- Department of Medical Genetics, Next Generation Genetic Polyclinic, Mashhad, Iran
| | - Ehsan Ghayoor Karimiani
- Molecular and Clinical Sciences Institute, St George’s, University of London, Cranmer Terrace London, London, UK
| | | | - Vykuntaraju K. Gowda
- Department of Pediatric Neurology, Indira Gandhi Institute of Child Health, Bangalore, India
| | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Zeynep Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Dana Marafi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Pediatrics, Faculty of Medicine, Kuwait University, Kuwait
| | - Lance Rodan
- Department of Neurology, Boston Children’s Hospital, Boston, Massachusetts, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, Massachusetts, USA
| | - Sedat Isikay
- Gaziantep Islam Science and Technology University, Medical Faculty, Department of Pediatric Neurology, Gaziantep, Turkey
| | - Jill A. Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Baylor Genetics Laboratories, Houston, TX, USA
| | - Subhadra Ramanathan
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Michael Staton
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Kerby C. Oberg
- Department of Pathology and Human Anatomy, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Robin D. Clark
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Catharina Wenman
- Rare & Inherited Disease Laboratory, NHS North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, WC1N 3BH, UK
| | - Sam Loughlin
- Rare & Inherited Disease Laboratory, NHS North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, WC1N 3BH, UK
| | - Ramy Saad
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Tazeen Ashraf
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Alison Male
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Shereen Tadros
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Reza Boostani
- Department of Neurology, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Ghada M.H. Abdel-Salam
- Department of Clinical Genetics, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
| | - Maha Zaki
- Department of Clinical Genetics, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
| | - Ebtesam Abdalla
- Department of Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - M. Chiara Manzini
- Department of Neuroscience and Cell Biology, Rutgers-Robert Wood Johnson Medical School, Child Health Institute of New Jersey, NY, USA
| | - Davut Pehlivan
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer E. Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Richard A. Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Henry Houlden
- Department of Neuromuscular diseases, UCL Institute of Neurology, WC1N 3BG, London, UK
| | - Fowzan S. Alkuraya
- Department of Translational Genomics, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
- Department of Pediatrics, Prince Sultan Military Medical City, Riyadh, Saudi Arabia
| | - Kinga Bujakowska
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Reza Maroofian
- Department of Neuromuscular diseases, UCL Institute of Neurology, WC1N 3BG, London, UK
| | - James R. Lupski
- Texas Children’s Hospital, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Long Nam Nguyen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228
- Immunology Program, Life Sciences Institute, National University of Singapore, Singapore 117456
- Singapore Lipidomics Incubator (SLING), Life Sciences Institute, National University of Singapore, Singapore 117456
- Cardiovascular Disease Research (CVD) Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117545
- Immunology Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456
| |
Collapse
|
61
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
62
|
Fowler DM, Rehm HL. Will variants of uncertain significance still exist in 2030? Am J Hum Genet 2024; 111:5-10. [PMID: 38086381 PMCID: PMC10806733 DOI: 10.1016/j.ajhg.2023.11.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/12/2023] [Accepted: 11/13/2023] [Indexed: 12/28/2023] Open
Abstract
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
Collapse
Affiliation(s)
- Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
63
|
Orenbuch R, Kollasch AW, Spinner HD, Shearer CA, Hopf TA, Franceschi D, Dias M, Frazer J, Marks DS. Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders. RESEARCH SQUARE 2024:rs.3.rs-3740259. [PMID: 38260496 PMCID: PMC10802723 DOI: 10.21203/rs.3.rs-3740259/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.
Collapse
Affiliation(s)
- Rose Orenbuch
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Aaron W. Kollasch
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Hansen D. Spinner
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Courtney A. Shearer
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Dinko Franceschi
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Mafalda Dias
- Dias & Frazer Group, Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
- University Pompeu Fabra, Barcelona, Spain
| | - Jonathan Frazer
- Dias & Frazer Group, Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
- University Pompeu Fabra, Barcelona, Spain
| | - Debora S. Marks
- Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
64
|
Gagan J. The Potential Utility of Large Language Models in Molecular Pathology. J Appl Lab Med 2024; 9:159-161. [PMID: 38167768 DOI: 10.1093/jalm/jfad102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 10/25/2023] [Indexed: 01/05/2024]
Affiliation(s)
- Jeffrey Gagan
- Department of Pathology, University of Texas Southwestern, Dallas, TX, United States
| |
Collapse
|
65
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
66
|
Kuderna LFK, Ulirsch JC, Rashid S, Ameen M, Sundaram L, Hickey G, Cox AJ, Gao H, Kumar A, Aguet F, Christmas MJ, Clawson H, Haeussler M, Janiak MC, Kuhlwilm M, Orkin JD, Bataillon T, Manu S, Valenzuela A, Bergman J, Rouselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, Schraiber JG, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, Valsecchi J, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin AD, Guschanski K, Schierup MH, Beck RMD, Karakikes I, Wang KC, Umapathy G, Roos C, Boubli JP, Siepel A, Kundaje A, Paten B, Lindblad-Toh K, Rogers J, Marques Bonet T, Farh KKH. Identification of constrained sequence elements across 239 primate genomes. Nature 2024; 625:735-742. [PMID: 38030727 PMCID: PMC10808062 DOI: 10.1038/s41586-023-06798-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023]
Abstract
Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.
Collapse
Affiliation(s)
- Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Jacob C Ulirsch
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Sabrina Rashid
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Mohamed Ameen
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Anthony J Cox
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Arvind Kumar
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Matthew J Christmas
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Mareike C Janiak
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Joseph D Orkin
- Département d'Anthropologie, Université de Montréal, Montréal, Quebec, Canada
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Alejandro Valenzuela
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Section for Ecoinformatics and Biodiversity, Department of Biology, Aarhus University, Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Lidia Agueda
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Julie Blanc
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Marta Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ian Goodhead
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - David Juan
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT, USA
| | | | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - João Valsecchi
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Rede de Pesquisa em Diversidade, Conservação e Uso da Fauna da Amazônia - RedeFauna, Manaus, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica-ComFauna, Iquitos, Peru
| | - Malu Messias
- Universidade Federal de Rondônia, Porto Velho, Brazil
| | | | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Rogerio Rossi
- Instituto de Biociências, Universidade Federal do Mato Grosso, Cuiabá, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
- Department of Biology, Trinity University, San Antonio, TX, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clifford J Jolly
- Department of Anthropology, New York University, New York, NY, USA
| | - Jane Phillips-Conroy
- Department of Neuroscience, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | | | - Sree Kanthaswamy
- School of Interdisciplinary Forensics, Arizona State University, Phoenix, AZ, USA
- California National Primate Research Center, University of California, Davis, CA, USA
| | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Long Zhou
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Greifswald-Insel Riems, Germany
- Professorship for International Animal Health/One Health, Faculty of Veterinary Medicine, Justus Liebig University, Giessen, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi, Vietnam
| | - Esther Lizano
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, Stuttgart, Germany
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Barcelonaβeta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | | | - Patrick Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
- School of Geosciences, Edinburgh, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, Göttingen, Germany
- Leibniz ScienceCampus Primate Cognition, Göttingen, Germany
| | - Ivo Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Amanda D Melin
- Department of Anthropology and Archaeology, University of Calgary, Calgary, Alberta, Canada
- Department of Medical Genetics, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | - Robin M D Beck
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | - Kevin C Wang
- Department of Cancer Biology, Stanford University, Stanford, CA, USA
- Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Tomas Marques Bonet
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Universitat Pompeu Fabra, Barcelona, Spain.
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA.
| |
Collapse
|
67
|
Housman G, Tung J. Next-generation primate genomics: New genome assemblies unlock new questions. Cell 2023; 186:5433-5437. [PMID: 38065076 PMCID: PMC11283640 DOI: 10.1016/j.cell.2023.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/09/2023] [Accepted: 11/09/2023] [Indexed: 12/18/2023]
Abstract
Nonhuman primates provide unique evolutionary and comparative insight into the human phenotype. Genome assemblies are now available for nearly half of the species in the primate order, expanding our understanding of genetic variation within and between species and making important contributions to evolutionary biology, evolutionary anthropology, and human genetics.
Collapse
Affiliation(s)
- Genevieve Housman
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany; Department of Evolutionary Anthropology, Duke University, Durham, NC, USA; Department of Biology, Duke University, Durham, NC, USA; Canadian Institute for Advanced Research, Toronto, Canada; Faculty of the Life Sciences, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
68
|
Marzi SJ, Schilder BM, Nott A, Frigerio CS, Willaime-Morawek S, Bucholc M, Hanger DP, James C, Lewis PA, Lourida I, Noble W, Rodriguez-Algarra F, Sharif JA, Tsalenchuk M, Winchester LM, Yaman Ü, Yao Z, Ranson JM, Llewellyn DJ. Artificial intelligence for neurodegenerative experimental models. Alzheimers Dement 2023; 19:5970-5987. [PMID: 37768001 DOI: 10.1002/alz.13479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/29/2023]
Abstract
INTRODUCTION Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross-model reproducibility and translation to human biology, while sustaining biological interpretability. DISCUSSION AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data. HIGHLIGHTS There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross-species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi-omics analysis with AI offers exciting future possibilities in drug discovery.
Collapse
Affiliation(s)
- Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Alexi Nott
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | | | - Magda Bucholc
- School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Diane P Hanger
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | | | - Patrick A Lewis
- Royal Veterinary College, London, UK
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
| | | | - Wendy Noble
- Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | | | - Jalil-Ahmad Sharif
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Maria Tsalenchuk
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | - Ümran Yaman
- UK Dementia Research Institute at UCL, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
69
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
70
|
Ljungdahl A, Kohani S, Page NF, Wells ES, Wigdor EM, Dong S, Sanders SJ. AlphaMissense is better correlated with functional assays of missense impact than earlier prediction algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.24.562294. [PMID: 37961354 PMCID: PMC10634779 DOI: 10.1101/2023.10.24.562294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Missense variants that alter a single amino acid in the encoded protein contribute to many human disorders but pose a substantial challenge in interpretation. Though these variants can be reliably identified through sequencing, distinguishing the clinically significant ones remains difficult, such that "Variants of Unknown Significance" outnumber those classified as "Pathogenic" or "Likely Pathogenic." Numerous in silico approaches have been developed to predict the functional impact of missense variants to inform clinical interpretation, the latest being AlphaMissense, which uses artificial intelligence methods trained on predicted protein structure. To independently assess the performance of AlphaMissense and 38 other predictors of missense severity, we compared predictions to data from multiplexed assays of variant effect (MAVE). MAVE experiments generate almost every possible individual amino acid change in a gene and measure their functional impact using a high-throughput assay. Assessing 17,696 variants across five genes (DDX3X, MSH2, PTEN, KCNQ4, and BRCA1), we find that AlphaMissense is consistently one of the top five algorithms based on correlation with functional impact and is the best-correlated algorithm for two genes. We conclude that AlphaMissense represents the current best-in-class predictor by this metric; however, the improvement over other algorithms is modest. We note that multiple missense predictors, including AlphaMissense, appear to overcall variants as pathogenic despite minimal functional impact and that substantially more high-quality training data, including consistently analyzed patient cohorts and MAVE analyses, are required to improve accuracy.
Collapse
Affiliation(s)
- Alicia Ljungdahl
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sayeh Kohani
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Nicholas F. Page
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eloise S. Wells
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Emilie M. Wigdor
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Shan Dong
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
71
|
Xie MJ, Cromie GA, Owens K, Timour MS, Tang M, Kutz JN, El-Hattab AW, McLaughlin RN, Dudley AM. Constructing and interpreting a large-scale variant effect map for an ultrarare disease gene: Comprehensive prediction of the functional impact of PSAT1 genotypes. PLoS Genet 2023; 19:e1010972. [PMID: 37812589 PMCID: PMC10561871 DOI: 10.1371/journal.pgen.1010972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 09/13/2023] [Indexed: 10/11/2023] Open
Abstract
Reduced activity of the enzymes encoded by PHGDH, PSAT1, and PSPH causes a set of ultrarare, autosomal recessive diseases known as serine biosynthesis defects. These diseases present in a broad phenotypic spectrum: at the severe end is Neu-Laxova syndrome, in the intermediate range are infantile serine biosynthesis defects with severe neurological manifestations and growth deficiency, and at the mild end is childhood disease with intellectual disability. However, L-serine supplementation, especially if started early, can ameliorate and in some cases even prevent symptoms. Therefore, knowledge of pathogenic variants can improve clinical outcomes. Here, we use a yeast-based assay to individually measure the functional impact of 1,914 SNV-accessible amino acid substitutions in PSAT. Results of our assay agree well with clinical interpretations and protein structure-function relationships, supporting the inclusion of our data as functional evidence as part of the ACMG variant interpretation guidelines. We use existing ClinVar variants, disease alleles reported in the literature and variants present as homozygotes in the primAD database to define assay ranges that could aid clinical variant interpretation for up to 98% of the tested variants. In addition to measuring the functional impact of individual variants in yeast haploid cells, we also assay pairwise combinations of PSAT1 alleles that recapitulate human genotypes, including compound heterozygotes, in yeast diploids. Results from our diploid assay successfully distinguish the genotypes of affected individuals from those of healthy carriers and agree well with disease severity. Finally, we present a linear model that uses individual allele measurements to predict the biallelic function of ~1.8 million allele combinations corresponding to potential human genotypes. Taken together, our work provides an example of how large-scale functional assays in model systems can be powerfully applied to the study of ultrarare diseases.
Collapse
Affiliation(s)
- Michael J. Xie
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
- Molecular Engineering Graduate Program, University of Washington, Seattle, Washington, United States of America
| | - Gareth A. Cromie
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
| | - Katherine Owens
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
- Department of Applied Mathematics, University of Washington, Seattle, Washington, United States of America
| | - Martin S. Timour
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
| | - Michelle Tang
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
| | - J. Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, Washington, United States of America
| | - Ayman W. El-Hattab
- Department of Clinical Sciences, College of Medicine, University of Sharjah, Sharjah, United Arab Emirates
| | | | - Aimée M. Dudley
- Pacific Northwest Research Institute, Seattle, Washington, United States of America
- Molecular Engineering Graduate Program, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
72
|
Abstract
Machine-learning algorithm uses structure prediction to spot disease-causing mutations.
Collapse
Affiliation(s)
- Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Theory of Condensed Matter, Cavendish Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
73
|
McBride DJ, Fielding C, Newington T, Vatsiou A, Fischl H, Bajracharya M, Thomson VS, Fraser LJ, Fujita PA, Becq J, Kingsbury Z, Ross MT, Moat SJ, Morgan S. Whole-Genome Sequencing Can Identify Clinically Relevant Variants from a Single Sub-Punch of a Dried Blood Spot Specimen. Int J Neonatal Screen 2023; 9:52. [PMID: 37754778 PMCID: PMC10532340 DOI: 10.3390/ijns9030052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/01/2023] [Accepted: 09/06/2023] [Indexed: 09/28/2023] Open
Abstract
The collection of dried blood spots (DBS) facilitates newborn screening for a variety of rare, but very serious conditions in healthcare systems around the world. Sub-punches of varying sizes (1.5-6 mm) can be taken from DBS specimens to use as inputs for a range of biochemical assays. Advances in DNA sequencing workflows allow whole-genome sequencing (WGS) libraries to be generated directly from inputs such as peripheral blood, saliva, and DBS. We compared WGS metrics obtained from libraries generated directly from DBS to those generated from DNA extracted from peripheral blood, the standard input for this type of assay. We explored the flexibility of DBS as an input for WGS by altering the punch number and size as inputs to the assay. We showed that WGS libraries can be successfully generated from a variety of DBS inputs, including a single 3 mm or 6 mm diameter punch, with equivalent data quality observed across a number of key metrics of importance in the detection of gene variants. We observed no difference in the performance of DBS and peripheral-blood-extracted DNA in the detection of likely pathogenic gene variants in samples taken from individuals with cystic fibrosis or phenylketonuria. WGS can be performed directly from DBS and is a powerful method for the rapid discovery of clinically relevant, disease-causing gene variants.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Stuart J. Moat
- Wales Newborn Screening Laboratory, University Hospital of Wales, Cardiff CF14 4XW, UK
- School of Medicine, Cardiff University, Cardiff CF14 4XW, UK
| | - Sian Morgan
- All Wales Genetics Laboratory, University Hospital of Wales, Cardiff CF14 4XW, UK
| |
Collapse
|
74
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
75
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
76
|
Biancolella M, Colona VL, Luzzatto L, Watt JL, Mattiuz G, Conticello SG, Kaminski N, Mehrian-Shai R, Ko AI, Gonsalves GS, Vasiliou V, Novelli G, Reichardt JKV. COVID-19 annual update: a narrative review. Hum Genomics 2023; 17:68. [PMID: 37488607 PMCID: PMC10367267 DOI: 10.1186/s40246-023-00515-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 07/16/2023] [Indexed: 07/26/2023] Open
Abstract
Three and a half years after the pandemic outbreak, now that WHO has formally declared that the emergency is over, COVID-19 is still a significant global issue. Here, we focus on recent developments in genetic and genomic research on COVID-19, and we give an outlook on state-of-the-art therapeutical approaches, as the pandemic is gradually transitioning to an endemic situation. The sequencing and characterization of rare alleles in different populations has made it possible to identify numerous genes that affect either susceptibility to COVID-19 or the severity of the disease. These findings provide a beginning to new avenues and pan-ethnic therapeutic approaches, as well as to potential genetic screening protocols. The causative virus, SARS-CoV-2, is still in the spotlight, but novel threatening virus could appear anywhere at any time. Therefore, continued vigilance and further research is warranted. We also note emphatically that to prevent future pandemics and other world-wide health crises, it is imperative to capitalize on what we have learnt from COVID-19: specifically, regarding its origins, the world's response, and insufficient preparedness. This requires unprecedented international collaboration and timely data sharing for the coordination of effective response and the rapid implementation of containment measures.
Collapse
Affiliation(s)
| | - Vito Luigi Colona
- Department of Biomedicine and Prevention, School of Medicine and Surgery, Tor Vergata University of Rome, Via Montpellier 1, 00133, Rome, Italy
| | - Lucio Luzzatto
- Department of Haematology and Blood Transfusion, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania
- University of Florence, 50121, Florence, Italy
| | - Jessica Lee Watt
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Smithfield, QLD, 4878, Australia
| | | | - Silvestro G Conticello
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Florence, Italy
- Institute of Clinical Physiology - National Council of Research (IFC-CNR), 56124, Pisa, Italy
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Ruty Mehrian-Shai
- Pediatric Hemato-Oncology, Edmond and Lilly Safra Children's Hospital, Sheba Medical Center, Tel Hashomer 2 Sheba Road, 52621, Ramat Gan, Israel
| | - Albert I Ko
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, USA
- Instituto Gonçalo MonizFundação Oswaldo Cruz, Salvador, Bahia, Brazil
| | - Gregg S Gonsalves
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, School of Public Health, Yale University, New Haven, USA
| | - Giuseppe Novelli
- Department of Biomedicine and Prevention, School of Medicine and Surgery, Tor Vergata University of Rome, Via Montpellier 1, 00133, Rome, Italy.
- IRCCS Neuromed, 86077, Pozzilli, IS, Italy.
- Department of Pharmacology, School of Medicine, University of Nevada, 89557, Reno, NV, USA.
| | - Juergen K V Reichardt
- Australian Institute of Tropical Health and Medicine, James Cook University, Smithfield, QLD, 4878, Australia
| |
Collapse
|
77
|
Eichler EE. Sampling a wide swathe of primate genetic diversity. CELL GENOMICS 2023; 3:100358. [PMID: 37492108 PMCID: PMC10363911 DOI: 10.1016/j.xgen.2023.100358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Two studies published in Science report the deepest survey of primate genetic diversity using short-read sequencing to sample ∼47% of extant species. Kuderna et al.1 investigate genetic diversity, mutation rates, and our primate phylogeny, while Gao et al.2 use the data to better classify disease-causing mutations.
Collapse
Affiliation(s)
- Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
78
|
Fiziev PP, McRae J, Ulirsch JC, Dron JS, Hamp T, Yang Y, Wainschtein P, Ni Z, Schraiber JG, Gao H, Cable D, Field Y, Aguet F, Fasnacht M, Metwally A, Rogers J, Marques-Bonet T, Rehm HL, O'Donnell-Luria A, Khera AV, Farh KKH. Rare penetrant mutations confer severe risk of common diseases. Science 2023; 380:eabo1131. [PMID: 37262146 DOI: 10.1126/science.abo1131] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/16/2023] [Indexed: 06/03/2023]
Abstract
We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common diseases and observed that rare, penetrant mutations in genes implicated by genome-wide association studies confer ~10-fold larger effects than common variants in the same genes. Consequently, an individual at the phenotypic extreme and at the greatest risk for severe, early-onset disease is better identified by a few rare penetrant variants than by the collective action of many common variants with weak effects. By combining rare variants across phenotype-associated genes into a unified genetic risk model, we demonstrate superior portability across diverse global populations compared with common-variant polygenic risk scores, greatly improving the clinical utility of genetic-based risk prediction.
Collapse
Affiliation(s)
- Petko P Fiziev
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jacob C Ulirsch
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jacqueline S Dron
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tobias Hamp
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Yanshen Yang
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Pierrick Wainschtein
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zijian Ni
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua G Schraiber
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Hong Gao
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Dylan Cable
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA
| | - Yair Field
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Francois Aguet
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Marc Fasnacht
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Ahmed Metwally
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Amit V Khera
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Verve Therapeutics, Cambridge, MA 02215, USA
| | - Kyle Kai-How Farh
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| |
Collapse
|