1
|
Abondio P, Bruno F. Single-cell pan-omics, environmental neurology, and artificial intelligence: the time for holistic brain health research. Neural Regen Res 2025; 20:1703-1704. [PMID: 39104102 PMCID: PMC11688541 DOI: 10.4103/nrr.nrr-d-24-00324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 05/07/2024] [Accepted: 05/22/2024] [Indexed: 08/07/2024] Open
Affiliation(s)
- Paolo Abondio
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy
| | - Francesco Bruno
- Association for Neurogenetic Research (ARN), Lamezia Terme, Catanzaro, Italy
| |
Collapse
|
2
|
Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, Rosen BD, Ma L, Li C, Baldwin RL, Van Tassell CP, Zhang Z, Smith TPL, Liu GE. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res 2025:S2090-1232(25)00258-9. [PMID: 40258473 DOI: 10.1016/j.jare.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 04/06/2025] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
INTRODUCTION Most SV studies in livestock rely on short-read sequencing, posing challenges in accurately characterizing large genomic variants due to their limited read length. OBJECTIVES Our goal is to reveal structural variation and novel sequences specific to Holstein and Jersey cattle breeds using long-read and pan-genome analyses. METHODS We sequenced 20 Holsteins and 8 Jersey cattle using PacBio HiFi to 20×, and integrated five read-based and one assembly-based SV caller to determine SVs. RESULTS We assembled the 28 genomes averaging 3.25 Gb with a contig N50 of 69.36 Mb and using the ARS-UCD1.2 reference, we acquired Holstein/Jersey SV catalogs with 74,068/54,689 events spanning 202/135 Mb (7.43 %/4.97 % of the genome). SVs were enriched in less conserved, non-coding, and non-regulatory regions. Comparing Holsteins with differing feed efficiency (FE), SVs unique to high FE were linked to energy metabolism and olfactory receptors, while those specific to low FE were associated with material transport. We constructed Holstein/Jersey pangenome graphs with 148,598/105,875 nodes and 208,891/147,990 edges, representing 47,028/37,137 biallelic and multi-allelic events, and 63.75/42.34 Mb of novel sequence. We observed SV count saturation with 20 Holsteins, while adding Jerseys significantly increased the SV count, highlighting breed-specific SV events. CONCLUSION Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought.
Collapse
Affiliation(s)
- Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China; Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Kristen Kuhn
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - Wenli Li
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Geoffrey Zanton
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Mary Bowman
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
| | - Lingzhao Fang
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| | - John B Cole
- Council on Dairy Cattle Breeding, 4201 Northview Dr, Bowie, MD 20716, USA; Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program, and the Genetics Institute, University of Florida, Gainesville, FL 32611-0910, USA; Department of Animal Science, North Carolina State University, Raleigh, NC 27695-7621, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Congjun Li
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| |
Collapse
|
3
|
Groza C, Ge B, Cheung WA, Pastinen T, Bourque G. Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA. Genome Res 2025; 35:644-652. [PMID: 40113263 PMCID: PMC12047246 DOI: 10.1101/gr.279240.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025]
Abstract
Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
Collapse
Affiliation(s)
- Cristian Groza
- Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada
| | - Bing Ge
- McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada
| | - Warren A Cheung
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA
| | - Tomi Pastinen
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA;
| | - Guillaume Bourque
- McGill University, Human Genetics, Montréal, Québec H3A 0C7, Canada;
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec H3A 2R7, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
| |
Collapse
|
4
|
Del Gobbo GF, Boycott KM. The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases. Genome Res 2025; 35:559-571. [PMID: 39900460 PMCID: PMC12047273 DOI: 10.1101/gr.279970.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
Long-read sequencing (LRS) is a promising technology positioned to study the significant proportion of rare diseases (RDs) that remain undiagnosed as it addresses many of the limitations of short-read sequencing, detecting and clarifying additional disease-associated variants that may be missed by the current standard diagnostic workflow for RDs. Some key areas where additional diagnostic yields may be realized include: (1) detection and resolution of structural variants (SVs); (2) detection and characterization of tandem repeat expansions; (3) coverage of regions of high sequence similarity; (4) variant phasing; (5) the use of de novo genome assemblies for reference-based or graph genome variant detection; and (6) epigenetic and transcriptomic evaluations. Examples from over 50 studies support that the main areas of added diagnostic yield currently lie in SV detection and characterization, repeat expansion assessment, and phasing (with or without DNA methylation information). Several emerging studies applying LRS in cohorts of undiagnosed RDs also demonstrate that LRS can boost diagnostic yields following negative standard-of-care clinical testing and provide an added yield of 7%-17% following negative short-read genome sequencing. With this evidence of improved diagnostic yield, we discuss the incorporation of LRS into the diagnostic care pathway for undiagnosed RDs, including current challenges and considerations, with the ultimate goal of ending the diagnostic odyssey for countless individuals with RDs.
Collapse
Affiliation(s)
- Giulia F Del Gobbo
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2;
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Ontario, Canada K1H 8L1
| |
Collapse
|
5
|
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Genome-wide association testing beyond SNPs. Nat Rev Genet 2025; 26:156-170. [PMID: 39375560 DOI: 10.1038/s41576-024-00778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/09/2024]
Abstract
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
Collapse
Affiliation(s)
- Laura Harris
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Xiaolei Zhang
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Katherine Fawcett
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Amy Foreman
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Petr Daneck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Panagiotis I Sergouniotis
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Francesco Mazzarotto
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael Inouye
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Ewan Birney
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
6
|
Negi S, Stenton SL, Berger SI, Canigiula P, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. Am J Hum Genet 2025; 112:428-449. [PMID: 39862869 PMCID: PMC11866955 DOI: 10.1016/j.ajhg.2025.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 12/22/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025] Open
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease-associated genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including structural variants (SVs) and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants and could be used to distinguish postzygotic mosaic variants from prezygotic de novos. Diagnostic variants were established by LRS in 11 probands, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
Affiliation(s)
- Shloka Negi
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sarah L Stenton
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Seth I Berger
- Children's National Research Institute, Washington, DC, USA
| | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ivo Violich
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Joshua Gardner
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Todd Hillaker
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sara M O'Rourke
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Melanie C O'Leary
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Elizabeth Carbonell
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Gabrielle Lemire
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jillian Serrano
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brian Mangilog
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace VanNoy
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, Irvine, CA, USA
| | - Anne O'Donnell-Luria
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Emmanuèle Délot
- Institute for Clinical and Translational Science, University of California, Irvine, Irvine, CA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
7
|
Vecchio D, Panfili FM, Macchiaiolo M, Dentici ML, Trivisano M, Medina CB, Capolino R, Salzano E, Cortellessa F, Busè M, Pantaleo A, Cocciadiferro D, Gonfiantini MV, Niceta M, De Dominicis A, Specchio N, Piccione M, Digilio MC, Tartaglia M, Novelli A, Bartuli A. Molecular and clinical Insights into KMT2E-Related O'Donnell-Luria-Rodan syndrome in a novel patient cohort. Eur J Med Genet 2025; 73:104990. [PMID: 39709003 DOI: 10.1016/j.ejmg.2024.104990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 12/23/2024]
Abstract
O'Donnell-Luria-Rodan (ODLURO) syndrome is an autosomal dominant neurodevelopmental disorder mainly characterized by global development delay/intellectual disability, white matter abnormalities, and behavioral manifestations. It is caused by pathogenic variants in the KMT2E gene. Here we report seven new patients with loss-of-function KMT2E variants, six harboring frameshift/nonsense changes, and one with a 7q22.3 microdeletion encompassing the entire gene-locus. We further characterize both the clinical phenotype as well as its associated pathogenic variants' spectrum providing new information on sex-related phenotype distribution, according to the variant groups. We also highlight different epilepsy phenotype-genotype correlation with preferential association of generalized epilepsy and/or developmental and epileptic encephalopathy with missense pathogenic variants and focal epilepsy, childhood absence epilepsy and/or febrile seizures with pathogenic truncating variants and structural rearrangements. By a systematic review of the previously reported series, we also discuss previously unappreciated findings, including progressive macrocephaly, apraxia, and higher risk of bone fractures.
Collapse
Affiliation(s)
- Davide Vecchio
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy.
| | - Filippo M Panfili
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Marina Macchiaiolo
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Maria Lisa Dentici
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Marina Trivisano
- Epilepsy and Movement Disorders Unit, Bambino Gesù Children's Hospital, IRCCS, Full Member of European Reference Network EpiCARE, Rome, Italy
| | - Carolina Benitez Medina
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Rossella Capolino
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Emanuela Salzano
- Medical Genetics Unit, Department of Genetics, Oncohaematology and Rare Diseases, AOOR Villa Sofia-Cervello, Palermo, Italy
| | - Fabiana Cortellessa
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Martina Busè
- Medical Genetics Unit, Department of Genetics, Oncohaematology and Rare Diseases, AOOR Villa Sofia-Cervello, Palermo, Italy
| | - Antonio Pantaleo
- Medical Genetics Unit, Department of Genetics, Oncohaematology and Rare Diseases, AOOR Villa Sofia-Cervello, Palermo, Italy
| | - Dario Cocciadiferro
- Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Michaela V Gonfiantini
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Marcello Niceta
- Molecular Genetics and Functional Genomics Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Angela De Dominicis
- Epilepsy and Movement Disorders Unit, Bambino Gesù Children's Hospital, IRCCS, Full Member of European Reference Network EpiCARE, Rome, Italy
| | - Nicola Specchio
- Epilepsy and Movement Disorders Unit, Bambino Gesù Children's Hospital, IRCCS, Full Member of European Reference Network EpiCARE, Rome, Italy
| | - Maria Piccione
- Medical Genetics Unit, Department of Genetics, Oncohaematology and Rare Diseases, AOOR Villa Sofia-Cervello, Palermo, Italy; Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
| | - Maria Cristina Digilio
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| | - Marco Tartaglia
- Molecular Genetics and Functional Genomics Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Antonio Novelli
- Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Andrea Bartuli
- Rare Diseases and Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCSS, Rome, Italy
| |
Collapse
|
8
|
Xia S, Chen J, Arsala D, Emerson JJ, Long M. Functional innovation through new genes as a general evolutionary process. Nat Genet 2025; 57:295-309. [PMID: 39875578 DOI: 10.1038/s41588-024-02059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 12/15/2024] [Indexed: 01/30/2025]
Abstract
In the past decade, our understanding of how new genes originate in diverse organisms has advanced substantially, and more than a dozen molecular mechanisms for generating initial gene structures were identified, in addition to gene duplication. These new genes have been found to integrate into and modify pre-existing gene networks primarily through mutation and selection, revealing new patterns and rules with stable origination rates across various organisms. This progress has challenged the prevailing belief that new proteins evolve from pre-existing genes, as new genes may arise de novo from noncoding DNA sequences in many organisms, with high rates observed in flowering plants. New genes have important roles in phenotypic and functional evolution across diverse biological processes and structures, with detectable fitness effects of sexual conflict genes that can shape species divergence. Such knowledge of new genes can be of translational value in agriculture and medicine.
Collapse
Affiliation(s)
- Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
9
|
Hiatt SM, Lawlor JMJ, Handley LH, Latner DR, Bonnstetter ZT, Finnila CR, Thompson ML, Boston LB, Williams M, Rodriguez Nunez I, Jenkins J, Kelley WV, Bebin EM, Lopez MA, Hurst ACE, Korf BR, Schmutz J, Grimwood J, Cooper GM. Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders. Genome Res 2024; 34:1747-1762. [PMID: 39299904 PMCID: PMC11610584 DOI: 10.1101/gr.279227.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024]
Abstract
Variant detection from long-read genome sequencing (lrGS) has proven to be more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences "HiFi" technology on 96 short-read-negative probands with rare diseases that were suspected to be genetic. We generated hg38-aligned variants and de novo phased genome assemblies, and subsequently annotated, filtered, and curated variants using clinical standards. New disease-relevant or potentially relevant genetic findings were identified in 16/96 (16.7%) probands, nine of which (8/96, ∼9.4%) harbored pathogenic or likely pathogenic variants. Nine probands (∼9.4%) had variants that were accurately called in both srGS and lrGS and represent changes to clinical interpretation, mostly from recently published gene-disease associations. Seven cases included variants that were only correctly interpreted in lrGS, including copy-number variants (CNVs), an inversion, a mobile element insertion, two low-complexity repeat expansions, and a 1 bp deletion. While evidence for each of these variants is, in retrospect, visible in srGS, they were either not called within srGS data, were represented by calls with incorrect sizes or structures, or failed quality control and filtration. Thus, while reanalysis of older srGS data clearly increases diagnostic yield, we find that lrGS allows for substantial additional yield (7/96, 7.3%) beyond srGS. We anticipate that as lrGS analysis improves, and as lrGS data sets grow allowing for better variant-frequency annotation, the additional lrGS-only rare disease yield will grow over time.
Collapse
Affiliation(s)
- Susan M Hiatt
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - James M J Lawlor
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Lori H Handley
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Donald R Latner
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | - Candice R Finnila
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | - Lori Beth Boston
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Melissa Williams
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Whitley V Kelley
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - E Martina Bebin
- Department of Neurology, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
| | - Michael A Lopez
- Department of Neurology, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
- Department of Pediatrics, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
- Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
| | - Anna C E Hurst
- Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
| | - Bruce R Korf
- Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35924, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
10
|
Groza C, Chen X, Wheeler TJ, Bourque G, Goubert C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Nat Commun 2024; 15:8915. [PMID: 39414821 PMCID: PMC11484939 DOI: 10.1038/s41467-024-53294-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/02/2024] [Indexed: 10/18/2024] Open
Abstract
Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline to analyze polymorphic mobile elements insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies polymorphic mobile elements from genomic assemblies or long-read sequencing data, and genotypes these variants using short or long read sets. Benchmarking on simulated and real datasets reports high precision and recall rates. GraffiTE is designed to allow non-expert users to perform comprehensive analyses, including in models with limited transposable element knowledge and is compatible with various sequencing technologies. Here, we demonstrate the versatility of GraffiTE by analyzing human, Drosophila melanogaster, maize, and Cannabis sativa pangenome data. These analyses reveal the landscapes of polymorphic mobile elements and their frequency variations across individuals, strains, and cultivars.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | - Xun Chen
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| | - Guillaume Bourque
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
- Canadian Centre for Computational Genomics, McGill University, Montréal, QC, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada
- Human Genetics, McGill University, Montréal, QC, Canada
| | - Clément Goubert
- Human Genetics, McGill University, Montréal, QC, Canada.
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
11
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
12
|
Negi S, Stenton SL, Berger SI, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.22.24312327. [PMID: 39228712 PMCID: PMC11370519 DOI: 10.1101/2024.08.22.24312327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ∼36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos . Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
|
13
|
LeMaster C, Schwendinger-Schreck C, Ge B, Cheung WA, McLennan R, Johnston JJ, Pastinen T, Smail C. Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.15.24304216. [PMID: 38562793 PMCID: PMC10984062 DOI: 10.1101/2024.03.15.24304216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22, 019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1×10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.
Collapse
Affiliation(s)
- Cas LeMaster
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Carl Schwendinger-Schreck
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Bing Ge
- McGill University, Montreal, Quebec, Canada
| | - Warren A. Cheung
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Rebecca McLennan
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Jeffrey J. Johnston
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Craig Smail
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| |
Collapse
|