1
|
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol 2024:10.1038/s41587-024-02225-z. [PMID: 38671154 DOI: 10.1038/s41587-024-02225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/28/2024] [Indexed: 04/28/2024]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | | | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
2
|
König E, Mitchell JS, Filosi M, Fuchsberger C. Impact of the inaccessible genome on genotype imputation and genome-wide association studies. Hum Mol Genet 2024:ddae062. [PMID: 38643062 DOI: 10.1093/hmg/ddae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/03/2024] [Accepted: 03/25/2024] [Indexed: 04/22/2024] Open
Abstract
Genotype imputation is widely used in genome-wide association studies (GWAS). However, both the genotyping chips and imputation reference panels are dependent on next-generation sequencing (NGS). Due to the nature of NGS, some regions of the genome are inaccessible to sequencing. To date, there has been no complete evaluation of these regions and their impact on the identification of associations in GWAS remains unclear. In this study, we systematically assess the extent to which variants in inaccessible regions are underrepresented on genotyping chips and imputation reference panels, in GWAS results and in variant databases. We also determine the proportion of genes located in inaccessible regions and compare the results across variant masks defined by the 1000 Genomes Project and the TOPMed program. Overall, fewer variants were observed in inaccessible regions in all categories analyzed. Depending on the mask used and normalized for region size, only 4%-17% of the genotyped variants are located in inaccessible regions and 52 to 581 genes were almost completely inaccessible. From the Cooperative Health Research in South Tyrol (CHRIS) study, we present a case study of an association located in an inaccessible region that is driven by genotyped variants and cannot be reproduced by imputation in GRCh37. We conclude that genotyping, NGS, genotype imputation and downstream analyses such as GWAS and fine mapping are systematically biased in inaccessible regions, due to missed variants and spurious associations. To help researchers assess gene and variant accessibility, we provide an online application (https://gab.gm.eurac.edu).
Collapse
Affiliation(s)
- Eva König
- Institute for Biomedicine (affiliated to the University of Lübeck), Eurac Research, Via Volta 21, Bolzano 39100, Italy
| | - Jonathan Stewart Mitchell
- Institute for Biomedicine (affiliated to the University of Lübeck), Eurac Research, Via Volta 21, Bolzano 39100, Italy
| | - Michele Filosi
- Institute for Biomedicine (affiliated to the University of Lübeck), Eurac Research, Via Volta 21, Bolzano 39100, Italy
| | - Christian Fuchsberger
- Institute for Biomedicine (affiliated to the University of Lübeck), Eurac Research, Via Volta 21, Bolzano 39100, Italy
| |
Collapse
|
3
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024:S1538-7836(24)00218-6. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
4
|
Hadar N, Dolgin V, Oustinov K, Yogev Y, Poleg T, Safran A, Freund O, Agam N, Jean MM, Proskorovski-Ohayon R, Wormser O, Drabkin M, Halperin D, Eskin-Schwartz M, Narkis G, Sued-Hendrickson S, Aminov I, Gombosh M, Aharoni S, Birk OS. VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19. Hum Genet 2024:10.1007/s00439-024-02671-4. [PMID: 38607411 DOI: 10.1007/s00439-024-02671-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/24/2024] [Indexed: 04/13/2024]
Abstract
With the increasing importance of genomic data in understanding genetic diseases, there is an essential need for efficient and user-friendly tools that simplify variant analysis. Although multiple tools exist, many present barriers such as steep learning curves, limited reference genome compatibility, or costs. We developed VARista, a free web-based tool, to address these challenges and provide a streamlined solution for researchers, particularly those focusing on rare monogenic diseases. VARista offers a user-centric interface that eliminates much of the technical complexity typically associated with variant analysis. The tool directly supports VCF files generated using reference genomes hg19, hg38, and the emerging T2T, with seamless remapping capabilities between them. Features such as gene summaries and links, tissue and cell-specific gene expression data for both adults and fetuses, as well as automated PCR design and integration with tools such as SpliceAI and AlphaMissense, enable users to focus on the biology and the case itself. As we demonstrate, VARista proved effective in narrowing down potential disease-causing variants, prioritizing them effectively, and providing meaningful biological context, facilitating rapid decision-making. VARista stands out as a freely available and comprehensive tool that consolidates various aspects of variant analysis into a single platform that embraces the forefront of genomic advancements. Its design inherently supports a shift in focus from technicalities to critical thinking, thereby promoting better-informed decisions in genetic disease research. Given its unique capabilities and user-centric design, VARista has the potential to become an essential asset for the genomic research community. https://VARista.link.
Collapse
Affiliation(s)
- Noam Hadar
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Vadim Dolgin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Katya Oustinov
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yuval Yogev
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Tomer Poleg
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Amit Safran
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ofek Freund
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Nadav Agam
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Matan M Jean
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Regina Proskorovski-Ohayon
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ohad Wormser
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Max Drabkin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Daniel Halperin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Marina Eskin-Schwartz
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Ginat Narkis
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Sufa Sued-Hendrickson
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ilana Aminov
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Maya Gombosh
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Sarit Aharoni
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ohad S Birk
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel.
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel.
| |
Collapse
|
5
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. bioRxiv 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
6
|
Boakye Serebour T, Cribbs AP, Baldwin MJ, Masimirembwa C, Chikwambi Z, Kerasidou A, Snelling SJB. Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries. Eur J Hum Genet 2024:10.1038/s41431-024-01564-4. [PMID: 38565638 DOI: 10.1038/s41431-024-01564-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/29/2024] [Accepted: 02/06/2024] [Indexed: 04/04/2024] Open
Abstract
The advent of single-cell resolution sequencing and spatial transcriptomics has enabled the delivery of cellular and molecular atlases of tissues and organs, providing new insights into tissue health and disease. However, if the full potential of these technologies is to be equitably realised, ancestrally inclusivity is paramount. Such a goal requires greater inclusion of both researchers and donors in low- and middle-income countries (LMICs). In this perspective, we describe the current landscape of ancestral inclusivity in genomic and single-cell transcriptomic studies. We discuss the collaborative efforts needed to scale the barriers to establishing, expanding, and adopting single-cell sequencing research in LMICs and to enable globally impactful outcomes of these technologies.
Collapse
Affiliation(s)
- Tracy Boakye Serebour
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Adam P Cribbs
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Mathew J Baldwin
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Collen Masimirembwa
- The African Institute of Biomedical Science and Technology, Harare, Zimbabwe
| | - Zedias Chikwambi
- The African Institute of Biomedical Science and Technology, Harare, Zimbabwe
| | - Angeliki Kerasidou
- The Ethox Centre and the Wellcome Centre for Ethics and Humanities, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Sarah J B Snelling
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
| |
Collapse
|
7
|
Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024; 96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]
Abstract
Single-cell technologies have transformed biomedical research over the last decade, opening up new possibilities for understanding cellular heterogeneity, both at the genomic and transcriptomic level. In addition, more recent developments of spatial transcriptomics technologies have made it possible to profile cells in their tissue context. In parallel, there have been substantial advances in sequencing technologies, and the third generation of methods are able to produce reads that are tens of kilobases long, with error rates matching the second generation short reads. Long reads technologies make it possible to better map large genome rearrangements and quantify isoform specific abundances. This further improves our ability to characterize functionally relevant heterogeneity. Here, we show how researchers have begun to combine single-cell, spatial transcriptomics, and long-read technologies, and how this is resulting in powerful new approaches to profiling both the genome and the transcriptome. We discuss the achievements so far, and we highlight remaining challenges and opportunities.
Collapse
Affiliation(s)
- Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Martin Hemberg
- Gene Lay Institute, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
8
|
Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhou Gu
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- University of Wisconsin-Madison, WI, USA
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
9
|
Lin Y, Li J, Gu Y, Jin L, Bai J, Zhang J, Wang Y, Liu P, Long K, He M, Li D, Liu C, Han Z, Zhang Y, Li X, Zeng B, Lu L, Kong F, Sun Y, Fan Y, Wang X, Wang T, Jiang A, Ma J, Shen L, Zhu L, Jiang Y, Tang G, Fan X, Liu Q, Li H, Wang J, Chen L, Ge L, Li X, Tang Q, Li M. Haplotype-resolved 3D chromatin architecture of the hybrid pig. Genome Res 2024; 34:310-325. [PMID: 38479837 PMCID: PMC10984390 DOI: 10.1101/gr.278101.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]
Abstract
In diploid mammals, allele-specific three-dimensional (3D) genome architecture may lead to imbalanced gene expression. Through ultradeep in situ Hi-C sequencing of three representative somatic tissues (liver, skeletal muscle, and brain) from hybrid pigs generated by reciprocal crosses of phenotypically and physiologically divergent Berkshire and Tibetan pigs, we uncover extensive chromatin reorganization between homologous chromosomes across multiple scales. Haplotype-based interrogation of multi-omic data revealed the tissue dependence of 3D chromatin conformation, suggesting that parent-of-origin-specific conformation may drive gene imprinting. We quantify the effects of genetic variations and histone modifications on allelic differences of long-range promoter-enhancer contacts, which likely contribute to the phenotypic differences between the parental pig breeds. We also observe the fine structure of somatically paired homologous chromosomes in the pig genome, which has a functional implication genome-wide. This work illustrates how allele-specific chromatin architecture facilitates concomitant shifts in allele-biased gene expression, as well as the possible consequential phenotypic changes in mammals.
Collapse
Affiliation(s)
- Yu Lin
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Li
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China;
| | - Yiren Gu
- College of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu 610041, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| | - Long Jin
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jingyi Bai
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jiaman Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Yujie Wang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Pengliang Liu
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Keren Long
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Mengnan He
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Diyan Li
- School of Pharmacy, Chengdu University, Chengdu 610106, China
| | - Can Liu
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Ziyin Han
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Yu Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiaokai Li
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Bo Zeng
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Lu Lu
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Fanli Kong
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Ying Sun
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Institute of Geriatric Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 610072, China
| | - Yongliang Fan
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Xun Wang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Tao Wang
- School of Pharmacy, Chengdu University, Chengdu 610106, China
| | - An'an Jiang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jideng Ma
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Linyuan Shen
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Li Zhu
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Yanzhi Jiang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Guoqing Tang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiaolan Fan
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Qingyou Liu
- Animal Molecular Design and Precise Breeding Key Laboratory of Guangdong Province, School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Hua Li
- Animal Molecular Design and Precise Breeding Key Laboratory of Guangdong Province, School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Jinyong Wang
- Pig Industry Sciences Key Laboratory of Ministry of Agriculture and Rural Affairs, Chongqing Academy of Animal Sciences, Chongqing 402460, China
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Li Chen
- Pig Industry Sciences Key Laboratory of Ministry of Agriculture and Rural Affairs, Chongqing Academy of Animal Sciences, Chongqing 402460, China
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Liangpeng Ge
- Pig Industry Sciences Key Laboratory of Ministry of Agriculture and Rural Affairs, Chongqing Academy of Animal Sciences, Chongqing 402460, China
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Xuewei Li
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Qianzi Tang
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China;
| | - Mingzhou Li
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China;
| |
Collapse
|
10
|
Paulin LF, Fan J, O'Neill K, Pleasance E, Porter VL, Jones SJM, Sedlazeck FJ. The benefit of a complete reference genome for cancer structural variant analysis. medRxiv 2024:2024.03.15.24304369. [PMID: 38562786 PMCID: PMC10984048 DOI: 10.1101/2024.03.15.24304369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remains challenging. We hypothesized that use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumour/normal matched benchmark sample and two patient samples show that the CHM13-T2T improves SV detection and prioritization accuracy compared to GRCh38, with a notable reduction in false positive calls. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 49 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. The benchmark is available at: 10.5281/zenodo.10819636 Our work demonstrates new approaches to optimize somatic SV prioritization in cancer with potential improvements in other genetic diseases.
Collapse
Affiliation(s)
- Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Jeremy Fan
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Kieran O'Neill
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Erin Pleasance
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Vanessa L Porter
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
11
|
Chorlton SD. Ten common issues with reference sequence databases and how to mitigate them. Front Bioinform 2024; 4:1278228. [PMID: 38560517 PMCID: PMC10978663 DOI: 10.3389/fbinf.2024.1278228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.
Collapse
|
12
|
Annapragada AV, Niknafs N, White JR, Bruhm DC, Cherry C, Medina JE, Adleff V, Hruban C, Mathios D, Foda ZH, Phallen J, Scharpf RB, Velculescu VE. Genome-wide repeat landscapes in cancer and cell-free DNA. Sci Transl Med 2024; 16:eadj9283. [PMID: 38478628 DOI: 10.1126/scitranslmed.adj9283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 02/16/2024] [Indexed: 03/22/2024]
Abstract
Genetic changes in repetitive sequences are a hallmark of cancer and other diseases, but characterizing these has been challenging using standard sequencing approaches. We developed a de novo kmer finding approach, called ARTEMIS (Analysis of RepeaT EleMents in dISease), to identify repeat elements from whole-genome sequencing. Using this method, we analyzed 1.2 billion kmers in 2837 tissue and plasma samples from 1975 patients, including those with lung, breast, colorectal, ovarian, liver, gastric, head and neck, bladder, cervical, thyroid, or prostate cancer. We identified tumor-specific changes in these patients in 1280 repeat element types from the LINE, SINE, LTR, transposable element, and human satellite families. These included changes to known repeats and 820 elements that were not previously known to be altered in human cancer. Repeat elements were enriched in regions of driver genes, and their representation was altered by structural changes and epigenetic states. Machine learning analyses of genome-wide repeat landscapes and fragmentation profiles in cfDNA detected patients with early-stage lung or liver cancer in cross-validated and externally validated cohorts. In addition, these repeat landscapes could be used to noninvasively identify the tissue of origin of tumors. These analyses reveal widespread changes in repeat landscapes of human cancers and provide an approach for their detection and characterization that could benefit early detection and disease monitoring of patients with cancer.
Collapse
Affiliation(s)
- Akshaya V Annapragada
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Noushin Niknafs
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - James R White
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Daniel C Bruhm
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Christopher Cherry
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Jamie E Medina
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Vilmos Adleff
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Carolyn Hruban
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Dimitrios Mathios
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Zachariah H Foda
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Jillian Phallen
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Victor E Velculescu
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| |
Collapse
|
13
|
Porubsky D, Eichler EE. A 25-year odyssey of genomic technology advances and structural variant discovery. Cell 2024; 187:1024-1037. [PMID: 38290514 DOI: 10.1016/j.cell.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024]
Abstract
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
14
|
Zhang Y, Hou G, Shen N. Non-coding DNA variants for risk in lupus. Best Pract Res Clin Rheumatol 2024:101937. [PMID: 38429183 DOI: 10.1016/j.berh.2024.101937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/08/2024] [Accepted: 02/12/2024] [Indexed: 03/03/2024]
Abstract
Systemic Lupus Erythematosus (SLE) is a multifactorial autoimmune disease that arises from a dynamic interplay between genetics and environmental triggers. The advent of sophisticated genomics technology has catalyzed a shift in our understanding of disease etiology, spotlighting the pivotal role of non-coding DNA variants in SLE pathogenesis. In this review, we present a comprehensive examination of the non-coding variants associated with SLE, shedding light on their role in influencing disease risk and progression. We discuss the latest methodological advancements that have been instrumental in the identification and functional characterization of these genomic elements, with a special focus on the transformative power of CRISPR-based gene-editing technologies. Additionally, the review probes into the therapeutic opportunities that arise from modulating non-coding regions associated with SLE. Through an exploration of the complex network of non-coding DNA, this review aspires to decode the genetic puzzle of SLE and set the stage for groundbreaking gene-based therapeutic interventions and the advancement of precision medicine strategies tailored to SLE management.
Collapse
Affiliation(s)
- Yutong Zhang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China
| | - Guojun Hou
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China
| | - Nan Shen
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, 200001, China.
| |
Collapse
|
15
|
Cerdán-Vélez D, Tress ML. The T2T-CHM13 reference assembly uncovers essential WASH1 and GPRIN2 paralogues. Bioinform Adv 2024; 4:vbae029. [PMID: 38464973 PMCID: PMC10924726 DOI: 10.1093/bioadv/vbae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/02/2024] [Accepted: 02/26/2024] [Indexed: 03/12/2024]
Abstract
Summary The recently published T2T-CHM13 reference assembly completed the annotation of the final 8% of the human genome. It introduced 1956 genes, close to 100 of which are predicted to be coding because they have a protein coding parent gene. Here, we confirm the coding status and functional relevance of two of these genes, paralogues of WASHC1 and GPRIN2. We find that LOC124908094, one of four novel subtelomeric WASH1 genes uncovered in the new assembly, produces the WASH1 protein that forms part of the vital actin-regulatory WASH complex. Its coding status is supported by abundant proteomics, conservation, and cDNA evidence. It was previously assumed that gene WASHC1 produced the functional WASH1 protein, but new evidence shows that WASHC1 is a human-derived duplication and likely to be one of 12 WASH1 pseudogenes in the human gene set. We also find that the T2T-CHM13 assembly has added a functionally important copy of GPRIN2 to the human gene set. We demonstrate that uniquely mapping peptides from proteomics databases support the novel LOC124900631 rather than the GRCh38 assembly GPRIN2 gene. These new additions to the set of human coding genes underlines the importance of the new T2T-CHM13 assembly. Availability and implementation None.
Collapse
Affiliation(s)
- Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Michael Liam Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
16
|
Kingsmore SF, Nofsinger R, Ellsworth K. Rapid genomic sequencing for genetic disease diagnosis and therapy in intensive care units: a review. NPJ Genom Med 2024; 9:17. [PMID: 38413639 PMCID: PMC10899612 DOI: 10.1038/s41525-024-00404-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
Single locus (Mendelian) diseases are a leading cause of childhood hospitalization, intensive care unit (ICU) admission, mortality, and healthcare cost. Rapid genome sequencing (RGS), ultra-rapid genome sequencing (URGS), and rapid exome sequencing (RES) are diagnostic tests for genetic diseases for ICU patients. In 44 studies of children in ICUs with diseases of unknown etiology, 37% received a genetic diagnosis, 26% had consequent changes in management, and net healthcare costs were reduced by $14,265 per child tested by URGS, RGS, or RES. URGS outperformed RGS and RES with faster time to diagnosis, and higher rate of diagnosis and clinical utility. Diagnostic and clinical outcomes will improve as methods evolve, costs decrease, and testing is implemented within precision medicine delivery systems attuned to ICU needs. URGS, RGS, and RES are currently performed in <5% of the ~200,000 children likely to benefit annually due to lack of payor coverage, inadequate reimbursement, hospital policies, hospitalist unfamiliarity, under-recognition of possible genetic diseases, and current formatting as tests rather than as a rapid precision medicine delivery system. The gap between actual and optimal outcomes in children in ICUs is currently increasing since expanded use of URGS, RGS, and RES lags growth in those likely to benefit through new therapies. There is sufficient evidence to conclude that URGS, RGS, or RES should be considered in all children with diseases of uncertain etiology at ICU admission. Minimally, diagnostic URGS, RGS, or RES should be ordered early during admissions of critically ill infants and children with suspected genetic diseases.
Collapse
Affiliation(s)
- Stephen F Kingsmore
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital, San Diego, CA, USA.
| | - Russell Nofsinger
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital, San Diego, CA, USA
| | - Kasia Ellsworth
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital, San Diego, CA, USA
| |
Collapse
|
17
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024:10.1038/s41576-024-00692-3. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
18
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
19
|
Cao X, Sun S, Xing J. A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome. Mol Cell Proteomics 2024; 23:100719. [PMID: 38242438 PMCID: PMC10867589 DOI: 10.1016/j.mcpro.2024.100719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 01/01/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024] Open
Abstract
Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Anesthesiology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Siqi Sun
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
| |
Collapse
|
20
|
Zhou R, Guo J, Feng X, Zhou W. Mechanisms of the role of proto-oncogene activation in promoting malignant transformation of mature B cells. Zhong Nan Da Xue Xue Bao Yi Xue Ban 2024; 49:113-121. [PMID: 38615172 PMCID: PMC11017026 DOI: 10.11817/j.issn.1672-7347.2024.230304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Indexed: 04/15/2024]
Abstract
Malignant tumors continue to pose a significant threat to human life and safety and their development is primarily due to the activation of proto-oncogenes and the inactivation of suppressor genes. Among these, the activation of proto-oncogenes possesses greater potential to drive the malignant transformation of cells. Targeting oncogenes involved in the malignant transformation of tumor cells has provided a novel approach for the development of current antitumor drugs. Several preclinical and clinical studies have revealed that the development pathway of B cells, and the malignant transformation of mature B cells into tumors have been regulated by oncogenes and their metabolites. Therefore, summarizing the key oncogenes involved in the process of malignant transformation of mature B cells and elucidating the mechanisms of action in tumor development hold significant importance for the clinical treatment of malignant tumors.
Collapse
Affiliation(s)
- Ruiqi Zhou
- Cancer Research Institute, School of Basic Medical Sciences, Central South University, Changsha 410078.
| | - Jiaojiao Guo
- Department of Hematology, Xiangya Hospital, Central South University, Changsha 410008
| | - Xiangling Feng
- Xiangya School of Public Health, Central South University, Changsha 410006, China
| | - Wen Zhou
- Cancer Research Institute, School of Basic Medical Sciences, Central South University, Changsha 410078.
| |
Collapse
|
21
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
22
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. medRxiv 2024:2024.01.11.24301165. [PMID: 38260490 PMCID: PMC10802764 DOI: 10.1101/2024.01.11.24301165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A. Ungar
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Pagé C. Goddard
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Tanner D. Jensen
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | | | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University
| | | | | | - Devon E. Bonner
- Department of Pediatrics, School of Medicine, Stanford University
- Stanford Center for Undiagnosed Diseases, Stanford University
| | | | - Matthew T. Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
23
|
Genovese G, Rockweiler NB, Gorman BR, Bigdeli TB, Pato MT, Pato CN, Ichihara K, McCarroll SA. BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics 2024; 40:btae038. [PMID: 38261650 PMCID: PMC10832354 DOI: 10.1093/bioinformatics/btae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/07/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates. RESULTS Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies. AVAILABILITY AND IMPLEMENTATION The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.
Collapse
Affiliation(s)
- Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Nicole B Rockweiler
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Bryan R Gorman
- Center for Data and Computational Sciences, VA Boston HealthCare System, Boston, MA 02130, United States
- Booz Allen Hamilton Inc, McLean, VA 22102, United States
| | - Tim B Bigdeli
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Cooperative Studies Program, VA New York Harbor Healthcare System, Brooklyn, NY 11209, United States
| | - Michelle T Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Carlos N Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Kiku Ichihara
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
24
|
D’Ambrosi S, García-Vílchez R, Kedra D, Vitali P, Macias-Cámara N, Bárcena L, Gonzalez-Lopez M, Aransay AM, Dietmann S, Hurtado A, Blanco S. Global and single-nucleotide resolution detection of 7-methylguanosine in RNA. RNA Biol 2024; 21:1-18. [PMID: 38566310 PMCID: PMC10993922 DOI: 10.1080/15476286.2024.2337493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2024] [Indexed: 04/04/2024] Open
Abstract
RNA modifications, including N-7-methylguanosine (m7G), are pivotal in governing RNA stability and gene expression regulation. The accurate detection of internal m7G modifications is of paramount significance, given recent associations between altered m7G deposition and elevated expression of the methyltransferase METTL1 in various human cancers. The development of robust m7G detection techniques has posed a significant challenge in the field of epitranscriptomics. In this study, we introduce two methodologies for the global and accurate identification of m7G modifications in human RNA. We introduce borohydride reduction sequencing (Bo-Seq), which provides base resolution mapping of m7G modifications. Bo-Seq achieves exceptional performance through the optimization of RNA depurination and scission, involving the strategic use of high concentrations of NaBH4, neutral pH and the addition of 7-methylguanosine monophosphate (m7GMP) during the reducing reaction. Notably, compared to NaBH4-based methods, Bo-Seq enhances the m7G detection performance, and simplifies the detection process, eliminating the necessity for intricate chemical steps and reducing the protocol duration. In addition, we present an antibody-based approach, which enables the assessment of m7G relative levels across RNA molecules and biological samples, however it should be used with caution due to limitations associated with variations in antibody quality between batches. In summary, our novel approaches address the pressing need for reliable and accessible methods to detect RNA m7G methylation in human cells. These advancements hold the potential to catalyse future investigations in the critical field of epitranscriptomics, shedding light on the complex regulatory roles of m7G in gene expression and its implications in cancer biology.
Collapse
Affiliation(s)
- Silvia D’Ambrosi
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
| | - Raquel García-Vílchez
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC)-University of Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca (IBSAL), Hospital Universitario de Salamanca, Salamanca, Spain
| | - Darek Kedra
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC)-University of Salamanca, Salamanca, Spain
| | - Patrice Vitali
- Molecular, Cellular and Developmental Biology unit (MCD), Centre de Biologie Integrative (CBI), University of Toulouse, UPS, CNRS, Toulouse, France
| | - Nuria Macias-Cámara
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
| | - Laura Bárcena
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
| | - Monika Gonzalez-Lopez
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
| | - Ana M. Aransay
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Madrid, Spain
| | - Sabine Dietmann
- Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Antonio Hurtado
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC)-University of Salamanca, Salamanca, Spain
| | - Sandra Blanco
- CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC)-University of Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca (IBSAL), Hospital Universitario de Salamanca, Salamanca, Spain
| |
Collapse
|
25
|
Chen NC, Paulin LF, Sedlazeck FJ, Koren S, Phillippy AM, Langmead B. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods 2024; 21:41-49. [PMID: 38036856 DOI: 10.1038/s41592-023-02069-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 10/09/2023] [Indexed: 12/02/2023]
Abstract
Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
26
|
Zheng ZM. RNA therapy is shining for genetic diseases. Mol Ther Nucleic Acids 2023; 34:102042. [PMID: 37876533 PMCID: PMC10590990 DOI: 10.1016/j.omtn.2023.102042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Affiliation(s)
- Zhi-Ming Zheng
- Tumor Virus RNA Biology Section, HIV Dynamics and Replication Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA
| |
Collapse
|
27
|
Jia P, Dong L, Yang X, Wang B, Bush SJ, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, Dang N, Ren L, Zhang Y, Wang X, Liang F, Wang Y, Ruan J, Xia H, Zheng Y, Shi L, Lv Y, Wang J, Ye K. Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biol 2023; 24:277. [PMID: 38049885 PMCID: PMC10694985 DOI: 10.1186/s13059-023-03116-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
Collapse
Affiliation(s)
- Peng Jia
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Lianhua Dong
- National Institute of Metrology, Beijing, 100029, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tingjie Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xixi Zhao
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yizhuo Che
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Ningxin Dang
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yujing Zhang
- National Institute of Metrology, Beijing, 100029, China
| | - Xia Wang
- National Institute of Metrology, Beijing, 100029, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Yang Wang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
| | - Jing Wang
- National Institute of Metrology, Beijing, 100029, China.
| | - Kai Ye
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
- Faculty of Science, Leiden University, Leiden, 2311EZ, The Netherlands.
| |
Collapse
|
28
|
Cabello-Aguilar S, Vendrell JA, Solassol J. A Bioinformatics Toolkit for Next-Generation Sequencing in Clinical Oncology. Curr Issues Mol Biol 2023; 45:9737-9752. [PMID: 38132454 PMCID: PMC10741970 DOI: 10.3390/cimb45120608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 11/28/2023] [Accepted: 12/02/2023] [Indexed: 12/23/2023] Open
Abstract
Next-generation sequencing (NGS) has taken on major importance in clinical oncology practice. With the advent of targeted therapies capable of effectively targeting specific genomic alterations in cancer patients, the development of bioinformatics processes has become crucial. Thus, bioinformatics pipelines play an essential role not only in the detection and in identification of molecular alterations obtained from NGS data but also in the analysis and interpretation of variants, making it possible to transform raw sequencing data into meaningful and clinically useful information. In this review, we aim to examine the multiple steps of a bioinformatics pipeline as used in current clinical practice, and we also provide an updated list of the necessary bioinformatics tools. This resource is intended to assist researchers and clinicians in their genetic data analyses, improving the precision and efficiency of these processes in clinical research and patient care.
Collapse
Affiliation(s)
- Simon Cabello-Aguilar
- Montpellier BioInformatics for Clinical Diagnosis (MOBIDIC), Molecular Medicine and Genomics Platform (PMMG), CHU Montpellier, 34295 Montpellier, France
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| | - Julie A. Vendrell
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| | - Jérôme Solassol
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| |
Collapse
|
29
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. bioRxiv 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
30
|
Pei T, Zhu S, Liao W, Fang Y, Liu J, Kong Y, Yan M, Cui M, Zhao Q. Gap-free genome assembly and CYP450 gene family analysis reveal the biosynthesis of anthocyanins in Scutellaria baicalensis. Hortic Res 2023; 10:uhad235. [PMID: 38156283 PMCID: PMC10753160 DOI: 10.1093/hr/uhad235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/01/2023] [Indexed: 12/30/2023]
Abstract
Scutellaria baicalensis Georgi, a member of the Lamiaceae family, is a widely utilized medicinal plant. The flavones extracted from S. baicalensis contribute to numerous health benefits, including anti-inflammatory, antiviral, and anti-tumor activities. However, the incomplete genome assembly hinders biological studies on S. baicalensis. This study presents the first telomere-to-telomere (T2T) gap-free genome assembly of S. baicalensis through the integration of Pacbio HiFi, Nanopore ultra-long and Hi-C technologies. A total of 384.59 Mb of genome size with a contig N50 of 42.44 Mb was obtained, and all sequences were anchored into nine pseudochromosomes without any gap or mismatch. In addition, we analysed the major cyanidin- and delphinidin-based anthocyanins involved in the determination of blue-purple flower using a widely-targeted metabolome approach. Based on the genome-wide identification of Cytochrome P450 (CYP450) gene family, three genes (SbFBH1, 2, and 5) encoding flavonoid 3'-hydroxylases (F3'Hs) and one gene (SbFBH7) encoding flavonoid 3'5'-hydroxylase (F3'5'H) were found to hydroxylate the B-ring of flavonoids. Our studies enrich the genomic information available for the Lamiaceae family and provide a toolkit for discovering CYP450 genes involved in the flavonoid decoration.
Collapse
Affiliation(s)
- Tianlin Pei
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Sanming Zhu
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
- National Key Laboratory of Wheat Improvement, College of Life Sciences, Shandong Agricultural University, Taian, 271018, China
| | - Weizhi Liao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Yumin Fang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Jie Liu
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Yu Kong
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Mengying Cui
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Qing Zhao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, CAS Center for Excellence in Molecular Plant Sciences Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
- State Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| |
Collapse
|
31
|
Beeler JS, Bolton KL. How low can you go?: Methodologic considerations in clonal hematopoiesis variant calling. Leuk Res 2023; 135:107419. [PMID: 37956474 DOI: 10.1016/j.leukres.2023.107419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023]
Abstract
Clonal hematopoiesis (CH) is defined by the presence of an expanded clonal hematopoietic cell population due to an acquired mutation conferring a selective growth advantage and is known to predispose to hematologic malignancy. In this review, we discuss sequencing methods for CH detection in bulk sequencing data and corresponding bioinformatic approaches for variant calling, filtering, and curation. We detail practical recommendations for CH calling. Finally, we discuss how improvements in CH sequencing and bioinformatic approaches will enable the characterization of CH trajectories, its impact on human health, and therapeutic approaches to mitigate its adverse effects.
Collapse
Affiliation(s)
- J Scott Beeler
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Kelly L Bolton
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
32
|
Morgan MA, Mohammad Parast S, Iwanaszko M, Aoi Y, Yoo D, Dumar ZJ, Howard BC, Helmin KA, Liu Q, Thakur WR, Zeidner JM, Singer BD, Eichler EE, Shilatifard A. ELOA3: A primate-specific RNA polymerase II elongation factor encoded by a tandem repeat gene cluster. Sci Adv 2023; 9:eadj1261. [PMID: 37992162 PMCID: PMC10664989 DOI: 10.1126/sciadv.adj1261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/19/2023] [Indexed: 11/24/2023]
Abstract
The biological role of the repetitive DNA sequences in the human genome remains an outstanding question. Recent long-read human genome assemblies have allowed us to identify a function for one of these repetitive regions. We have uncovered a tandem array of conserved primate-specific retrogenes encoding the protein Elongin A3 (ELOA3), a homolog of the RNA polymerase II (RNAPII) elongation factor Elongin A (ELOA). Our genomic analysis shows that the ELOA3 gene cluster is conserved among primates and the number of ELOA3 gene repeats is variable in the human population and across primate species. Moreover, the gene cluster has undergone concerted evolution and homogenization within primates. Our biochemical studies show that ELOA3 functions as a promoter-associated RNAPII pause-release elongation factor with distinct biochemical and functional features from its ancestral homolog, ELOA. We propose that the ELOA3 gene cluster has evolved to fulfil a transcriptional regulatory function unique to the primate lineage that can be targeted to regulate cellular hyperproliferation.
Collapse
Affiliation(s)
- Marc A. J. Morgan
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Saeid Mohammad Parast
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Marta Iwanaszko
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Yuki Aoi
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA 98195, USA
| | - Zachary J. Dumar
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Benjamin C. Howard
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Kathryn A. Helmin
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Simpson Querrey Lung Institute for Translational Science, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Qianli Liu
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Simpson Querrey Lung Institute for Translational Science, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - William R. Thakur
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Jacob M. Zeidner
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Benjamin D. Singer
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Simpson Querrey Lung Institute for Translational Science, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Ali Shilatifard
- Department of Biochemistry and Molecular Genetics, Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
33
|
Fleming AM, Zhu J, Done VK, Burrows CJ. Advantages and challenges associated with bisulfite-assisted nanopore direct RNA sequencing for modifications. RSC Chem Biol 2023; 4:952-964. [PMID: 37920399 PMCID: PMC10619145 DOI: 10.1039/d3cb00081h] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/23/2023] [Indexed: 11/04/2023] Open
Abstract
Nanopore direct RNA sequencing is a technology that allows sequencing for epitranscriptomic modifications with the possibility of a quantitative assessment. In the present work, pseudouridine (Ψ) was sequenced with the nanopore before and after the pH 7 bisulfite reaction that yields stable ribose adducts at C1' of Ψ. The adducted sites produced greater base call errors in the form of deletion signatures compared to Ψ. Sequencing studies on E. coli rRNA and tmRNA before and after the pH 7 bisulfite reaction demonstrated that using chemically-assisted nanopore sequencing has distinct advantages for minimization of false positives and false negatives in the data. The rRNA from E. coli has 19 known U/C sequence variations that give similar base call signatures as Ψ, and therefore, are false positives when inspecting base call data; however, these sites are refractory to reacting with bisulfite as is easily observed in nanopore data. The E. coli tmRNA has a low occupancy Ψ in a pyrimidine-rich sequence context that is called a U representing a false negative; partial occupancy by Ψ is revealed after the bisulfite reaction. In a final study, 5-methylcytidine (m5C) in RNA can readily be observed after the pH 5 bisulfite reaction in which the parent C deaminates to U and the modified site does not react. This locates m5C when using bisulfite-assisted nanopore direct RNA sequencing, which is otherwise challenging to observe. The advantages and challenges of the overall approach are discussed.
Collapse
Affiliation(s)
- Aaron M Fleming
- Department of Chemistry, University of Utah 315 S. 1400 East Salt Lake City UT 84112-0850 USA
| | - Judy Zhu
- Department of Chemistry, University of Utah 315 S. 1400 East Salt Lake City UT 84112-0850 USA
| | - Vilhelmina K Done
- Department of Chemistry, University of Utah 315 S. 1400 East Salt Lake City UT 84112-0850 USA
| | - Cynthia J Burrows
- Department of Chemistry, University of Utah 315 S. 1400 East Salt Lake City UT 84112-0850 USA
| |
Collapse
|
34
|
English A, Dolzhenko E, Jam HZ, Mckenzie S, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Benchmarking of small and large variants across tandem repeats. bioRxiv 2023:2023.10.29.564632. [PMID: 37961319 PMCID: PMC10634962 DOI: 10.1101/2023.10.29.564632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Collapse
|
35
|
Varabyou A, Sommer MJ, Erdogdu B, Shinder I, Minkin I, Chao KH, Park S, Heinz J, Pockrandt C, Shumate A, Rincon N, Puiu D, Steinegger M, Salzberg SL, Pertea M. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol 2023; 24:249. [PMID: 37904256 PMCID: PMC10614308 DOI: 10.1186/s13059-023-03088-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
| | - Markus J Sommer
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Ida Shinder
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ilia Minkin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sukhwan Park
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Jakob Heinz
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Christopher Pockrandt
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Natalia Rincon
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
36
|
Falker-Gieske C. Transcriptome driven discovery of novel candidate genes for human neurological disorders in the telomer-to-telomer genome assembly era. Hum Genomics 2023; 17:94. [PMID: 37872607 PMCID: PMC10594789 DOI: 10.1186/s40246-023-00543-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/17/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND With the first complete draft of a human genome, the Telomere-to-Telomere Consortium unlocked previously concealed genomic regions for genetic analyses. These regions harbour nearly 2000 potential novel genes with unknown function. In order to uncover candidate genes associated with human neurological pathologies, a comparative transcriptome study using the T2T-CHM13 and the GRCh38 genome assemblies was conducted on previously published datasets for eight distinct human neurological disorders. RESULTS The analysis of differential expression in RNA sequencing data led to the identification of 336 novel candidate genes linked to human neurological disorders. Additionally, it was revealed that, on average, 3.6% of the differentially expressed genes detected with the GRCh38 assembly may represent potential false positives. Among the noteworthy findings, two novel genes were discovered, one encoding a pore-structured protein and the other a highly ordered β-strand-rich protein. These genes exhibited upregulation in multiple epilepsy datasets and hold promise as candidate genes potentially modulating the progression of the disease. Furthermore, an analysis of RNA derived from white matter lesions in multiple sclerosis patients indicated significant upregulation of 26 rRNA encoding genes. Additionally, putative pathology related genes were identified for Alzheimer's disease, amyotrophic lateral sclerosis, glioblastoma, glioma, and conditions resulting from the m.3242 A > G mtDNA mutation. CONCLUSION The results presented here underline the potential of the T2T-CHM13 assembly in facilitating the discovery of candidate genes from transcriptome data in the context of human disorders. Moreover, the results demonstrate the value of remapping sequencing data to a superior genome assembly. Numerous potential pathology related genes, either as causative factors or related elements, have been unveiled, warranting further experimental validation.
Collapse
Affiliation(s)
- Clemens Falker-Gieske
- Division of Functional Breeding, Department of Animal Sciences, Georg-August-Universität Göttingen, Burckhardtweg 2, 37077, Göttingen, Germany.
| |
Collapse
|
37
|
Liu J, Liu F, Pan W. Improving the Completeness of Chromosome-Level Assembly by Recalling Sequences from Lost Contigs. Genes (Basel) 2023; 14:1926. [PMID: 37895275 PMCID: PMC10606404 DOI: 10.3390/genes14101926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/13/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
For a long time, the construction of complete reference genomes for complex eukaryotic genomes has been hindered by the limitations of sequencing technologies. Recently, the Pacific Biosciences (PacBio) HiFi data and Oxford Nanopore Technologies (ONT) Ultra-Long data, leveraging their respective advantages in accuracy and length, have provided an opportunity for generating complete chromosome sequences. Nevertheless, for the majority of genomes, the chromosome-level assemblies generated using existing methods still miss a high proportion of sequences due to losing small contigs in the step of assembly and scaffolding. To address this shortcoming, in this paper, we propose a novel method that is able to identify and fill the gaps in the chromosome-level assembly by recalling the sequences in the lost small contigs. Experimental results on both real and simulated datasets demonstrate that this method is able to improve the completeness of the chromosome-level assembly.
Collapse
Affiliation(s)
- Junyang Liu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences (ICR, CAAS), Shenzhen 518120, China
| | - Fang Liu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- National Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang 455000, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences (ICR, CAAS), Shenzhen 518120, China
| |
Collapse
|
38
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
39
|
Xie S, Isaacs K, Becker G, Murdoch BM. A computational framework for improving genetic variants identification from 5,061 sheep sequencing data. J Anim Sci Biotechnol 2023; 14:127. [PMID: 37779189 PMCID: PMC10544426 DOI: 10.1186/s40104-023-00923-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 08/01/2023] [Indexed: 10/03/2023] Open
Abstract
BACKGROUND Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation. Joint calling is routinely used to combine identified variants across multiple related samples. However, the improvement of variants identification using the mutual support information from multiple samples remains quite limited for population-scale genotyping. RESULTS In this study, we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples' data. The variants were accurately identified from multiple samples by using four steps: (1) Probabilities of variants from two widely used algorithms, GATK and Freebayes, were calculated by Poisson model incorporating base sequencing error potential; (2) The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification (rHID) variants database; (3) The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate (FDR) using rHID database; (4) To avoid the elimination of potentially true variants from rHID database, the variants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants. The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32% compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number (GPC5), scrapie pathology (PAPSS2), seasonal reproduction and litter size (GRM1), coat color (RAB27A), and lentivirus susceptibility (TMEM154). CONCLUSION The new method used the computational strategy to reduce the number of false positives, and simultaneously improve the identification of genetic variants. This strategy did not incur any extra cost by using any additional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.
Collapse
Affiliation(s)
- Shangqian Xie
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | | | - Gabrielle Becker
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA.
| |
Collapse
|
40
|
Li B, Yang Q, Yang L, Zhou X, Deng L, Qu L, Guo D, Hui R, Guo Y, Liu X, Wang T, Fan L, Li M, Yan M. A gap-free reference genome reveals structural variations associated with flowering time in rapeseed ( Brassica napus). Hortic Res 2023; 10:uhad171. [PMID: 37841499 PMCID: PMC10569240 DOI: 10.1093/hr/uhad171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 08/20/2023] [Indexed: 10/17/2023]
Abstract
Allopolyploid oilseed rape (Brassica napus) is an important oil crop and vegetable. However, the latest version of its reference genome, with collapsed duplications, gaps, and other issues, prevents comprehensive genomic analysis. Herein, we report a gap-free assembly of the rapeseed cv. Xiang5A genome using a combination of ONT (Oxford Nanopore Technologies) ultra-long reads, PacBio high-fidelity reads, and Hi-C datasets. It includes gap-free assemblies of all 19 chromosomes and telomere-to-telomere assemblies of eight chromosomes. Compared with previously published genomes of B. napus, our gap-free genome, with a contig N50 length of 50.70 Mb, has complete assemblies of 9 of 19 chromosomes without manual intervention, and greatly improves contiguity and completeness, thereby representing the highest quality genome assembly to date. Our results revealed that B. napus Xiang5A underwent nearly complete triplication and allotetraploidy relative to Arabidopsis thaliana. Using the gap-free assembly, we found that 917 flowering-related genes were affected by structural variation, including BnaA03.VERNALIZATION INSENSITIVE 3 and BnaC04.HIGH EXPRESSION OF OSMOTICALLY RESPONSIVE GENES 1. These genes may play crucial roles in regulating flowering time and facilitating the adaptation of Xiang5A in the Yangtze River Basin of China. This reference genome provides a valuable genetic resource for rapeseed functional genomic studies and breeding.
Collapse
Affiliation(s)
- Bao Li
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Qian Yang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Lulu Yang
- Department of Cell Biology and Genetics, Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei 430021, China
| | - Xing Zhou
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Lichao Deng
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Liang Qu
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Dengli Guo
- Department of Cell Biology and Genetics, Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei 430021, China
| | - Rongkui Hui
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Yiming Guo
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Xinhong Liu
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Tonghua Wang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Lianyi Fan
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Mei Li
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| | - Mingli Yan
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
- Hunan Hybrid Rapeseed Engineering and Technology Research Center, Changsha, Hunan 410125, China
| |
Collapse
|
41
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
42
|
Lowther C, Valkanas E, Giordano JL, Wang HZ, Currall BB, O'Keefe K, Pierce-Hoffman E, Kurtas NE, Whelan CW, Hao SP, Weisburd B, Jalili V, Fu J, Wong I, Collins RL, Zhao X, Austin-Tse CA, Evangelista E, Lemire G, Aggarwal VS, Lucente D, Gauthier LD, Tolonen C, Sahakian N, Stevens C, An JY, Dong S, Norton ME, MacKenzie TC, Devlin B, Gilmore K, Powell BC, Brandt A, Vetrini F, DiVito M, Sanders SJ, MacArthur DG, Hodge JC, O'Donnell-Luria A, Rehm HL, Vora NL, Levy B, Brand H, Wapner RJ, Talkowski ME. Systematic evaluation of genome sequencing for the diagnostic assessment of autism spectrum disorder and fetal structural anomalies. Am J Hum Genet 2023; 110:1454-1469. [PMID: 37595579 PMCID: PMC10502737 DOI: 10.1016/j.ajhg.2023.07.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/25/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
Short-read genome sequencing (GS) holds the promise of becoming the primary diagnostic approach for the assessment of autism spectrum disorder (ASD) and fetal structural anomalies (FSAs). However, few studies have comprehensively evaluated its performance against current standard-of-care diagnostic tests: karyotype, chromosomal microarray (CMA), and exome sequencing (ES). To assess the clinical utility of GS, we compared its diagnostic yield against these three tests in 1,612 quartet families including an individual with ASD and in 295 prenatal families. Our GS analytic framework identified a diagnostic variant in 7.8% of ASD probands, almost 2-fold more than CMA (4.3%) and 3-fold more than ES (2.7%). However, when we systematically captured copy-number variants (CNVs) from the exome data, the diagnostic yield of ES (7.4%) was brought much closer to, but did not surpass, GS. Similarly, we estimated that GS could achieve an overall diagnostic yield of 46.1% in unselected FSAs, representing a 17.2% increased yield over karyotype, 14.1% over CMA, and 4.1% over ES with CNV calling or 36.1% increase without CNV discovery. Overall, GS provided an added diagnostic yield of 0.4% and 0.8% beyond the combination of all three standard-of-care tests in ASD and FSAs, respectively. This corresponded to nine GS unique diagnostic variants, including sequence variants in exons not captured by ES, structural variants (SVs) inaccessible to existing standard-of-care tests, and SVs where the resolution of GS changed variant classification. Overall, this large-scale evaluation demonstrated that GS significantly outperforms each individual standard-of-care test while also outperforming the combination of all three tests, thus warranting consideration as the first-tier diagnostic approach for the assessment of ASD and FSAs.
Collapse
Affiliation(s)
- Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Elise Valkanas
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Jessica L Giordano
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Harold Z Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin B Currall
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Kathryn O'Keefe
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emma Pierce-Hoffman
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nehir E Kurtas
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Christopher W Whelan
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie P Hao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vahid Jalili
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Isaac Wong
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Christina A Austin-Tse
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Emily Evangelista
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vimla S Aggarwal
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Diane Lucente
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Laura D Gauthier
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charlotte Tolonen
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nareh Sahakian
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christine Stevens
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joon-Yong An
- School of Biosystem and Biomedical Science, Korea University, Seoul, South Korea
| | - Shan Dong
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Mary E Norton
- Center for Maternal-Fetal Precision Medicine, University of California, San Francisco, San Francisco, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Tippi C MacKenzie
- Center for Maternal-Fetal Precision Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kelly Gilmore
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bradford C Powell
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alicia Brandt
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Francesco Vetrini
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Michelle DiVito
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Daniel G MacArthur
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Centre for Population Genomics, Garvan Institute of Medical Research, and University of New South Wales Sydney, Sydney, NSW, Australia; Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Jennelle C Hodge
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Anne O'Donnell-Luria
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Neeta L Vora
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Brynn Levy
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Ronald J Wapner
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA; Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA; Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
43
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
44
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
45
|
Wang J, Lu L, Zheng S, Wang D, Jin L, Zhang Q, Li M, Zhang Z. DeCOOC Deconvoluted Hi-C Map Characterizes the Chromatin Architecture of Cells in Physiologically Distinctive Tissues. Adv Sci (Weinh) 2023; 10:e2301058. [PMID: 37515382 PMCID: PMC10520690 DOI: 10.1002/advs.202301058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/06/2023] [Indexed: 07/30/2023]
Abstract
Deciphering variations in chromosome conformations based on bulk three-dimensional (3D) genomic data from heterogenous tissues is a key to understanding cell-type specific genome architecture and dynamics. Surprisingly, computational deconvolution methods for high-throughput chromosome conformation capture (Hi-C) data remain very rare in the literature. Here, a deep convolutional neural network (CNN), deconvolve bulk Hi-C data (deCOOC) that remarkably outperformed all the state-of-the-art tools in the deconvolution task is developed. Interestingly, it is noticed that the chromatin accessibility or the Hi-C contact frequency alone is insufficient to explain the power of deCOOC, suggesting the existence of a latent embedded layer of information pertaining to the cell type specific 3D genome architecture. By applying deCOOC to in-house-generated bulk Hi-C data from visceral and subcutaneous adipose tissues, it is found that the characteristic chromatin features of M2 cells in the two anatomical loci are distinctively bound to different physiological functionalities. Taken together, deCOOC is both a reliable Hi-C data deconvolution method and a powerful tool for functional extraction of 3D genome architecture.
Collapse
Affiliation(s)
- Junmei Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Lu Lu
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Shiqi Zheng
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Danyang Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
- Sars‐Fang Centre & MOE Key Laboratory of Marine Genetics and BreedingCollege of Marine Life SciencesOcean University of ChinaQingdao266100China
| | - Long Jin
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Qing Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
| | - Mingzhou Li
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| |
Collapse
|
46
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
47
|
Sharafutdinov I, Tegtmeyer N, Linz B, Rohde M, Vieth M, Tay ACY, Lamichhane B, Tuan VP, Fauzia KA, Sticht H, Yamaoka Y, Marshall BJ, Backert S. A single-nucleotide polymorphism in Helicobacter pylori promotes gastric cancer development. Cell Host Microbe 2023; 31:1345-1358.e6. [PMID: 37490912 DOI: 10.1016/j.chom.2023.06.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/23/2023] [Accepted: 06/27/2023] [Indexed: 07/27/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) in various human genes are key factors in carcinogenesis. However, whether SNPs in bacterial pathogens are similarly crucial in cancer development is unknown. Here, we analyzed 1,043 genomes of the stomach pathogen Helicobacter pylori and pinpointed a SNP in the serine protease HtrA (position serine/leucine 171) that significantly correlates with gastric cancer. Our functional studies reveal that the 171S-to-171L mutation triggers HtrA trimer formation and enhances proteolytic activity and cleavage of epithelial junction proteins occludin and tumor-suppressor E-cadherin. 171L-type HtrA, but not 171S-HtrA-possessing H. pylori, inflicts severe epithelial damage, enhances injection of oncoprotein CagA into epithelial cells, increases NF-κB-mediated inflammation and cell proliferation through nuclear accumulation of β-catenin, and promotes host DNA double-strand breaks, collectively triggering malignant changes. These findings highlight the 171S/L HtrA mutation as a unique bacterial cancer-associated SNP and as a potential biomarker for risk predictions in H. pylori infections.
Collapse
Affiliation(s)
- Irshad Sharafutdinov
- Division of Microbiology, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Nicole Tegtmeyer
- Division of Microbiology, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Bodo Linz
- Division of Microbiology, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Manfred Rohde
- Central Facility for Microscopy, Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany
| | - Michael Vieth
- Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Klinikum Bayreuth, 95445 Bayreuth, Germany
| | - Alfred Chin-Yen Tay
- Helicobacter Research Laboratory, Marshall Centre for Infectious Diseases Research and Training, School of Pathology and Laboratory Medicine, University of Western Australia, 6009 Perth, Australia
| | - Binit Lamichhane
- Helicobacter Research Laboratory, Marshall Centre for Infectious Diseases Research and Training, School of Pathology and Laboratory Medicine, University of Western Australia, 6009 Perth, Australia
| | - Vo Phuoc Tuan
- Department of Endoscopy, Choray Hospital, Ho Chi Minh, Vietnam; Department of Environmental and Preventive Medicine, Faculty of Medicine, Oita University, Yufu, Oita, Japan
| | - Kartika Afrida Fauzia
- Department of Environmental and Preventive Medicine, Faculty of Medicine, Oita University, Yufu, Oita, Japan; Department of Public Health and Preventive Medicine, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Heinrich Sticht
- Division of Bioinformatics, Institute of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Yoshio Yamaoka
- Department of Environmental and Preventive Medicine, Faculty of Medicine, Oita University, Yufu, Oita, Japan; Department of Medicine, Gastroenterology and Hepatology Section, Baylor College of Medicine, Houston, TX 77030, USA
| | - Barry J Marshall
- Helicobacter Research Laboratory, Marshall Centre for Infectious Diseases Research and Training, School of Pathology and Laboratory Medicine, University of Western Australia, 6009 Perth, Australia; University of Western Australia, Marshall Centre, M504, Crawley, WA, Australia; Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen, China
| | - Steffen Backert
- Division of Microbiology, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany.
| |
Collapse
|
48
|
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, Yuan B, Boone PM, Groopman EE, Délot EC, Jain D, Sanchis-Juan A, Starita LM, Talkowski M, Montgomery SB, Bamshad MJ, Chong JX, Wheeler MT, Berger SI, O'Donnell-Luria A, Sedlazeck FJ, Miller DE. Beyond the exome: What's next in diagnostic testing for Mendelian conditions. Am J Hum Genet 2023; 110:1229-1248. [PMID: 37541186 PMCID: PMC10432150 DOI: 10.1016/j.ajhg.2023.06.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/06/2023] Open
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
Collapse
Affiliation(s)
- Monica H Wojcik
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Newborn Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Chloe M Reuter
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Michael H Duyzend
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Hayk Barseghyan
- Center for Genetics Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20010, USA; Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics and Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Philip M Boone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emily E Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanuèle C Délot
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; Center for Genetics Medicine Research, Children's National Research and Innovation Campus, Washington, DC, USA; Department of Pediatrics, George Washington University, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lea M Starita
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephen B Montgomery
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael J Bamshad
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jessica X Chong
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Matthew T Wheeler
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Seth I Berger
- Center for Genetics Medicine Research and Rare Disease Institute, Children's National Hospital, Washington, DC 20010, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA; Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
49
|
Yu H, Zheng Z, Su J, Lam TW, Luo R. Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP. BMC Bioinformatics 2023; 24:308. [PMID: 37537536 PMCID: PMC10401749 DOI: 10.1186/s12859-023-05434-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 07/31/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP . CONCLUSIONS These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.
Collapse
Affiliation(s)
- Huijing Yu
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Junhao Su
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| |
Collapse
|
50
|
Chin CS, Behera S, Khalak A, Sedlazeck FJ, Sudmant PH, Wagner J, Zook JM. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat Methods 2023; 20:1213-1221. [PMID: 37365340 PMCID: PMC10406601 DOI: 10.1038/s41592-023-01914-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/17/2023] [Indexed: 06/28/2023]
Abstract
Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR-TK to the class II major histocompatibility complex demonstrating the importance of the human pangenome for analyzing complicated regions. Moreover, we investigate the Y-chromosome genes, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders. We further showcase PGR-TK across 395 complex repetitive medically important genes. This highlights the power of PGR-TK to resolve complex variation in regions of the genome that were previously too complex to analyze.
Collapse
Affiliation(s)
- Chen-Shan Chin
- GeneDX, Stamford, CT, USA.
- Foundation of Biological Data Science, Belmont, CA, USA.
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Asif Khalak
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|