101
|
Zhang Z, An HH, Vege S, Hu T, Zhang S, Mosbruger T, Jayaraman P, Monos D, Westhoff CM, Chou ST. Accurate long-read sequencing allows assembly of the duplicated RHD and RHCE genes harboring variants relevant to blood transfusion. Am J Hum Genet 2022; 109:180-191. [PMID: 34968422 DOI: 10.1016/j.ajhg.2021.12.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 12/07/2021] [Indexed: 12/18/2022] Open
Abstract
Next-generation sequencing (NGS) technologies have transformed medical genetics. However, short-read lengths pose a limitation on identification of structural variants, sequencing repetitive regions, phasing of distant nucleotide changes, and distinguishing highly homologous genomic regions. Long-read sequencing technologies may offer improvements in the characterization of genes that are currently difficult to assess. We used a combination of targeted DNA capture, long-read sequencing, and a customized bioinformatics pipeline to fully assemble the RH region, which harbors variation relevant to red cell donor-recipient mismatch, particularly among patients with sickle cell disease. RHD and RHCE are a pair of duplicated genes located within an ∼175 kb region on human chromosome 1 that have high sequence similarity and frequent structural variations. To achieve the assembly, we utilized palindrome repeats in PacBio SMRT reads to obtain consensus sequences of 2.1 to 2.9 kb average length with over 99% accuracy. We used these long consensus sequences to identify 771 assembly markers and to phase the RHD-RHCE region with high confidence. The dataset enabled direct linkage between coding and intronic variants, phasing of distant SNPs to determine RHD-RHCE haplotypes, and identification of known and novel structural variations along with the breakpoints. A limiting factor in phasing is the frequency of heterozygous assembly markers and therefore was most successful in samples from African Black individuals with increased heterogeneity at the RH locus. Overall, this approach allows RH genotyping and de novo assembly in an unbiased and comprehensive manner that is necessary to expand application of NGS technology to high-resolution RH typing.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Hyun Hyung An
- Division of Hematology, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Sunitha Vege
- Immunohematology and Genomics, New York Blood Center, New York, NY 11101, USA
| | - Taishan Hu
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Shiping Zhang
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Timothy Mosbruger
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Pushkala Jayaraman
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Dimitri Monos
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, Perelman Schools of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Connie M Westhoff
- Immunohematology and Genomics, New York Blood Center, New York, NY 11101, USA
| | - Stella T Chou
- Division of Hematology, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Transfusion Medicine, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
102
|
Athanasopoulou K, Boti MA, Adamopoulos PG, Skourou PC, Scorilas A. Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics. Life (Basel) 2021; 12:life12010030. [PMID: 35054423 PMCID: PMC8780579 DOI: 10.3390/life12010030] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/20/2021] [Accepted: 12/23/2021] [Indexed: 12/14/2022] Open
Abstract
Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.
Collapse
|
103
|
Baslan T, Kovaka S, Sedlazeck FJ, Zhang Y, Wappel R, Tian S, Lowe SW, Goodwin S, Schatz MC. High resolution copy number inference in cancer using short-molecule nanopore sequencing. Nucleic Acids Res 2021; 49:e124. [PMID: 34551429 PMCID: PMC8643650 DOI: 10.1093/nar/gkab812] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/19/2021] [Accepted: 09/09/2021] [Indexed: 01/23/2023] Open
Abstract
Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.
Collapse
Affiliation(s)
- Timour Baslan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yanming Zhang
- Cytogenetics Laboratory, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sha Tian
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
104
|
Righini M, Costa J, Zhou W. DNA bridges: A novel platform for single-molecule sequencing and other DNA-protein interaction applications. PLoS One 2021; 16:e0260428. [PMID: 34807931 PMCID: PMC8608331 DOI: 10.1371/journal.pone.0260428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 11/10/2021] [Indexed: 01/22/2023] Open
Abstract
DNA molecular combing is a technique that stretches thousands of long individual DNA molecules (up to 10 Mbp) into a parallel configuration on surface. It has previously been proposed to sequence these molecules by synthesis. However, this approach poses two critical challenges: 1-Combed DNA molecules are overstretched and therefore a nonoptimal substrate for polymerase extension. 2-The combing surface sterically impedes full enzymatic access to the DNA backbone. Here, we introduce a novel approach that attaches thousands of molecules to a removable surface, with a tunable stretching factor. Next, we dissolve portions of the surface, leaving the DNA molecules suspended as 'bridges'. We demonstrate that the suspended molecules are enzymatically accessible, and we have used an enzyme to incorporate labeled nucleotides, as predicted by the specific molecular sequence. Our results suggest that this novel platform is a promising candidate to achieve high-throughput sequencing of Mbp-long molecules, which could have additional genomic applications, such as the study of other protein-DNA interactions.
Collapse
Affiliation(s)
- Maurizio Righini
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| | - Justin Costa
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| | - Wei Zhou
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| |
Collapse
|
105
|
Bolognini D, Magi A. Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data. Front Genet 2021; 12:761791. [PMID: 34868242 PMCID: PMC8637281 DOI: 10.3389/fgene.2021.761791] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/11/2021] [Indexed: 01/27/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.
Collapse
Affiliation(s)
- Davide Bolognini
- Unit of Medical Genetics, Meyer Children’s Hospital, Florence, Italy
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Florence, Italy
| |
Collapse
|
106
|
Jiang T, Liu S, Cao S, Liu Y, Cui Z, Wang Y, Guo H. Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation. BMC Bioinformatics 2021; 22:552. [PMID: 34772337 PMCID: PMC8588741 DOI: 10.1186/s12859-021-04422-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 10/04/2021] [Indexed: 11/18/2022] Open
Abstract
Background With the rapid development of long-read sequencing technologies, it is possible to reveal the full spectrum of genetic structural variation (SV). However, the expensive cost, finite read length and high sequencing error for long-read data greatly limit the widespread adoption of SV calling. Therefore, it is urgent to establish guidance concerning sequencing coverage, read length, and error rate to maintain high SV yields and to achieve the lowest cost simultaneously. Results In this study, we generated a full range of simulated error-prone long-read datasets containing various sequencing settings and comprehensively evaluated the performance of SV calling with state-of-the-art long-read SV detection methods. The benchmark results demonstrate that almost all SV callers perform better when the long-read data reach 20× coverage, 20 kbp average read length, and approximately 10–7.5% or below 1% error rates. Furthermore, high sequencing coverage is the most influential factor in promoting SV calling, while it also directly determines the expensive costs. Conclusions Based on the comprehensive evaluation results, we provide important guidelines for selecting long-read sequencing settings for efficient SV calling. We believe these recommended settings of long-read sequencing will have extraordinary guiding significance in cutting-edge genomic studies and clinical practices. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04422-y.
Collapse
Affiliation(s)
- Tao Jiang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Shiqi Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Shuqi Cao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Zhe Cui
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| | - Hongzhe Guo
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
107
|
Wu Z, Jiang Z, Li T, Xie C, Zhao L, Yang J, Ouyang S, Liu Y, Li T, Xie Z. Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation. Nat Commun 2021; 12:6501. [PMID: 34764282 PMCID: PMC8586011 DOI: 10.1038/s41467-021-26856-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 10/21/2021] [Indexed: 02/05/2023] Open
Abstract
A complete characterization of genetic variation is a fundamental goal of human genome research. Long-read sequencing has improved the sensitivity of structural variant discovery. Here, we conduct the long-read sequencing-based structural variant analysis for 405 unrelated Chinese individuals, with 68 phenotypic and clinical measurements. We discover a landscape of 132,312 nonredundant structural variants, of which 45.2% are novel. The identified structural variants are of high-quality, with an estimated false discovery rate of 3.2%. The concatenated length of all the structural variants is approximately 13.2% of the human reference genome. We annotate 1,929 loss-of-function structural variants affecting the coding sequence of 1,681 genes. We discover rare deletions in HBA1/HBA2/HBB associated with anemia. Furthermore, we identify structural variants related to immunity which differentiate the northern and southern Chinese populations. Our study describes the landscape of structural variants in the Chinese population and their contribution to phenotypes and disease.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Chuanbo Xie
- Sun Yat-sen University Cancer Center, Sun Yat-sen University, Guangzhou, China
| | - Liansheng Zhao
- Mental Health Center and Psychiatric Laboratory, the State Key Laboratory of Biotherapy, West China Hospital of Sichuan University, Chengdu, China
- Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China
| | - Jiaqi Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Shuai Ouyang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tao Li
- Mental Health Center and Psychiatric Laboratory, the State Key Laboratory of Biotherapy, West China Hospital of Sichuan University, Chengdu, China.
- Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
108
|
Ma H, Jiang J, He J, Liu H, Han L, Gong Y, Li B, Yu Z, Tang S, Zhang Y, Duan Y, Yin Y, Zeng Q, Yi J, He X, Zeng Y, Kim KS, Xu K, Liang F, He J. Long-read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Mol Ecol Resour 2021; 22:1508-1520. [PMID: 34758184 DOI: 10.1111/1755-0998.13550] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 10/26/2021] [Accepted: 10/28/2021] [Indexed: 12/15/2022]
Abstract
Advances in long-read sequencing technology and genome assembly provide an opportunity to improve the pig genome and reveal the full range of structural variations (SVs) between local Chinese and European pigs. To date, little is known about the genomes of some unique Chinese indigenous breeds, such as the Ningxiang pig. Here, we report the sequencing and assembly of a highly contiguous Ningxiang pig genome (NX) via an integration of PacBio single-molecule real-time sequencing, Illumina next-generation sequencing, BioNano optical mapping and Hi-C (chromosome conformation capture) approaches. The assembled genome comprises 2.44 Gb with a contig N50 of 26.1 Mb and 418 contigs in total. These contigs are organized into 121 scaffolds with a scaffold N50 of 139.0 Mb. More than 99.1% of the assembled sequence could be localized to 19 pseudochromosomes and is annotated with 20,914 protein-coding genes and 34.04% repetitive sequences. Comparisons between the NX and European Duroc assemblies revealed many SVs in genes involved in the immune system, nervous system, lipid metabolism and environmental adaptation. The genetic variants include 47 Chinese domestic pig-specific SVs and the associated 74 genes may contribute to the differences in domestic traits compared to European pigs. Moreover, single nucleotide polymorphisms (SNPs) identified from whole genome resequencing data of 73 Chinese pigs, representing 17 geographically isolated breeds, showed their specific genetic variations, population structure and evolutionary patterns. Finally, we explore transcriptional regulation in the first intron of the MYL4 gene, as the genomic SV (281-bp deletion) in Ningxiang pig promotes its subcutaneous fat compared to European pig breeds. This work identifies a set of Asian-specific SVs and SNPs, which will be important resources for modern pig breeding and genetic conservation.
Collapse
Affiliation(s)
- Haiming Ma
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Juan Jiang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Jun He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | | | | | - Yan Gong
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Biao Li
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Zonggang Yu
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Shengguo Tang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Yuebo Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| | - Yehui Duan
- Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, PR China
| | - Yulong Yin
- Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, PR China
| | - Qinghua Zeng
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China.,Ningxiang Pig Farm of Dalong Livestock Technology Co., Ltd, Ningxiang, PR China
| | | | - Xinglong He
- Bureau of Animal Husbandry, Veterinary and Fisheries in Ningxiang City, Ningxiang, PR China
| | - Yongbo Zeng
- Bureau of Animal Husbandry, Veterinary and Fisheries in Ningxiang City, Ningxiang, PR China
| | - Kung Seok Kim
- Department of Natural Resources Ecology and Management, Iowa State University, Ames, Iowa, USA
| | - Kang Xu
- Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, PR China
| | - Fan Liang
- Grandomics Biosciences, Wuhan, PR China
| | - Jianhua He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, PR China
| |
Collapse
|
109
|
Ward CM, Perry KD, Baker G, Powis K, Heckel DG, Baxter SW. A haploid diamondback moth (Plutella xylostella L.) genome assembly resolves 31 chromosomes and identifies a diamide resistance mutation. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2021; 138:103622. [PMID: 34252570 DOI: 10.1016/j.ibmb.2021.103622] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 07/04/2021] [Accepted: 07/04/2021] [Indexed: 05/21/2023]
Abstract
The diamondback moth, Plutella xylostella (L.), is a highly mobile brassica crop pest with worldwide distribution and can rapidly evolve resistance to insecticides, including group 28 diamides. Reference genomes assembled using Illumina sequencing technology have provided valuable resources to advance our knowledge regarding the biology, origin and movement of diamondback moth, and more recently with its sister species, Plutella australiana. Here we apply a trio binning approach to sequence and annotate a chromosome level reference genome of P. xylostella using PacBio Sequel and Dovetail Hi-C sequencing technology and identify a point mutation that causes resistance to commercial diamides. A P. xylostella population collected from brassica crops in the Lockyer Valley, Australia (LV-R), was reselected for chlorantraniliprole resistance then a single male was crossed to a P. australiana female and a hybrid pupa sequenced. A chromosome level 328 Mb P. xylostella genome was assembled with 98.1% assigned to 30 autosomes and the Z chromosome. The genome was highly complete with 98.4% of BUSCO Insecta genes identified and RNAseq informed protein prediction annotated 19,002 coding genes. The LV-R strain survived recommended field application doses of chlorantraniliprole, flubendiamide and cyclaniliprole. Some hybrids also survived these doses, indicating significant departure from recessivity, which has not been previously documented for diamides. Diamide chemicals modulate insect Ryanodine Receptors (RyR), disrupting calcium homeostasis, and we identified an amino acid substitution (I4790K) recently reported to cause diamide resistance in a strain from Japan. This chromosome level assembly provides a new resource for insect comparative genomics and highlights the emergence of diamide resistance in Australia. Resistance management plans need to account for the fact that resistance is not completely recessive.
Collapse
Affiliation(s)
- C M Ward
- School of Biological Sciences, University of Adelaide, 5005, Australia
| | - K D Perry
- South Australian Research and Development Institute, Urrbrae, 5064, Australia
| | - G Baker
- South Australian Research and Development Institute, Urrbrae, 5064, Australia
| | - K Powis
- South Australian Research and Development Institute, Urrbrae, 5064, Australia
| | - D G Heckel
- Department of Entomology, Max Planck Institute for Chemical Ecology, 07745, Jena, Germany
| | - S W Baxter
- Bio21 Institute, School of BioSciences, University of Melbourne, 3052, Australia.
| |
Collapse
|
110
|
Luo X, Sun D, Wang S, Luo S, Fu Y, Niu L, Shi Q, Zhang Y. Integrating full-length transcriptomics and metabolomics reveals the regulatory mechanisms underlying yellow pigmentation in tree peony (Paeonia suffruticosa Andr.) flowers. HORTICULTURE RESEARCH 2021; 8:235. [PMID: 34719694 PMCID: PMC8558324 DOI: 10.1038/s41438-021-00666-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/08/2021] [Accepted: 07/27/2021] [Indexed: 06/02/2023]
Abstract
Tree peony (Paeonia suffruticosa Andr.) is a popular ornamental plant in China due to its showy and colorful flowers. However, yellow-colored flowers are rare in both wild species and domesticated cultivars. The molecular mechanisms underlying yellow pigmentation remain poorly understood. Here, petal tissues of two tree peony cultivars, "High Noon" (yellow flowers) and "Roufurong" (purple-red flowers), were sampled at five developmental stages (S1-S5) from early flower buds to full blooms. Five petal color indices (brightness, redness, yellowness, chroma, and hue angle) and the contents of ten different flavonoids were determined. Compared to "Roufurong," which accumulated abundant anthocyanins at S3-S5, the yellow-colored "High Noon" displayed relatively higher contents of tetrahydroxychalcone (THC), flavones, and flavonols but no anthocyanin production. The contents of THC, flavones, and flavonols in "High Noon" peaked at S3 and dropped gradually as the flower bloomed, consistent with the color index patterns. Furthermore, RNA-seq analyses at S3 showed that structural genes such as PsC4Hs, PsDFRs, and PsUFGTs in the flavonoid biosynthesis pathway were downregulated in "High Noon," whereas most PsFLSs, PsF3Hs, and PsF3'Hs were upregulated. Five transcription factor (TF) genes related to flavonoid biosynthesis were also upregulated in "High Noon." One of these TFs, PsMYB111, was overexpressed in tobacco, which led to increased flavonols but decreased anthocyanins. Dual-luciferase assays further confirmed that PsMYB111 upregulated PsFLS. These results improve our understanding of yellow pigmentation in tree peony and provide a guide for future molecular-assisted breeding experiments in tree peony with novel flower colors.
Collapse
Affiliation(s)
- Xiaoning Luo
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Daoyang Sun
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Shu Wang
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Sha Luo
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Yaqi Fu
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Lixin Niu
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China
| | - Qianqian Shi
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China.
| | - Yanlong Zhang
- College of Landscape Architecture and Art, Northwest A&F University, Yangling, China.
| |
Collapse
|
111
|
Long-read technologies identify a hidden inverted duplication in a family with choroideremia. HGG ADVANCES 2021; 2:100046. [PMID: 35047838 PMCID: PMC8756506 DOI: 10.1016/j.xhgg.2021.100046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 07/01/2021] [Indexed: 12/03/2022] Open
Abstract
The lack of molecular diagnoses in rare genetic diseases can be explained by limitations of current standard genomic technologies. Upcoming long-read techniques have complementary strengths to overcome these limitations, with a particular strength in identifying structural variants. By using optical genome mapping and long-read sequencing, we aimed to identify the pathogenic variant in a large family with X-linked choroideremia. In this family, aberrant splicing of exon 12 of the choroideremia gene CHM was detected in 2003, but the underlying genomic defect remained elusive. Optical genome mapping and long-read sequencing approaches now revealed an intragenic 1,752 bp inverted duplication including exon 12 and surrounding regions, located downstream of the wild-type copy of exon 12. Both breakpoint junctions were confirmed with Sanger sequencing and segregate with the X-linked inheritance in the family. The breakpoint junctions displayed sequence microhomology suggestive for an erroneous replication mechanism as the origin of the structural variant. The inverted duplication is predicted to result in a hairpin formation of the pre-mRNA with the wild-type exon 12, leading to exon skipping in the mature mRNA. The identified inverted duplication is deemed the hidden pathogenic cause of disease in this family. Our study shows that optical genome mapping and long-read sequencing have significant potential for the identification of (hidden) structural variants in rare genetic diseases.
Collapse
|
112
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
113
|
Yan SM, Sherman RM, Taylor DJ, Nair DR, Bortvin AN, Schatz MC, McCoy RC. Local adaptation and archaic introgression shape global diversity at human structural variant loci. eLife 2021; 10:e67615. [PMID: 34528508 PMCID: PMC8492059 DOI: 10.7554/elife.67615] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/14/2021] [Indexed: 12/13/2022] Open
Abstract
Large genomic insertions and deletions are a potent source of functional variation, but are challenging to resolve with short-read sequencing, limiting knowledge of the role of such structural variants (SVs) in human evolution. Here, we used a graph-based method to genotype long-read-discovered SVs in short-read data from diverse human genomes. We then applied an admixture-aware method to identify 220 SVs exhibiting extreme patterns of frequency differentiation - a signature of local adaptation. The top two variants traced to the immunoglobulin heavy chain locus, tagging a haplotype that swept to near fixation in certain southeast Asian populations, but is rare in other global populations. Further investigation revealed evidence that the haplotype traces to gene flow from Neanderthals, corroborating the role of immune-related genes as prominent targets of adaptive introgression. Our study demonstrates how recent technical advances can help resolve signatures of key evolutionary events that remained obscured within technically challenging regions of the genome.
Collapse
Affiliation(s)
- Stephanie M Yan
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Divya R Nair
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Andrew N Bortvin
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| |
Collapse
|
114
|
PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci Rep 2021; 11:18065. [PMID: 34508117 PMCID: PMC8433307 DOI: 10.1038/s41598-021-96829-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 08/17/2021] [Indexed: 12/20/2022] Open
Abstract
Advances in sequencing technology have allowed researchers to sequence DNA with greater ease and at decreasing costs. Main developments have focused on either sequencing many short sequences or fewer large sequences. Methods for sequencing mid-sized sequences of 600-5,000 bp are currently less efficient. For example, the PacBio Sequel I system yields ~ 100,000-300,000 reads with an accuracy per base pair of 90-99%. We sought to sequence several DNA populations of ~ 870 bp in length with a sequencing accuracy of 99% and to the greatest depth possible. We optimised a simple, robust method to concatenate genes of ~ 870 bp five times and then sequenced the resulting DNA of ~ 5,000 bp by PacBioSMRT long-read sequencing. Our method improved upon previously published concatenation attempts, leading to a greater sequencing depth, high-quality reads and limited sample preparation at little expense. We applied this efficient concatenation protocol to sequence nine DNA populations from a protein engineering study. The improved method is accompanied by a simple and user-friendly analysis pipeline, DeCatCounter, to sequence medium-length sequences efficiently at one-fifth of the cost.
Collapse
|
115
|
Yang L, Malhotra R, Chikhi R, Elleder D, Kaiser T, Rong J, Medvedev P, Poss M. Recombination marks the evolutionary dynamics of a recently endogenized retrovirus. Mol Biol Evol 2021; 38:5423-5436. [PMID: 34480565 PMCID: PMC8662619 DOI: 10.1093/molbev/msab252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
All vertebrate genomes have been colonized by retroviruses along their evolutionary trajectory. Although endogenous retroviruses (ERVs) can contribute important physiological functions to contemporary hosts, such benefits are attributed to long-term coevolution of ERV and host because germline infections are rare and expansion is slow, and because the host effectively silences them. The genomes of several outbred species including mule deer (Odocoileus hemionus) are currently being colonized by ERVs, which provides an opportunity to study ERV dynamics at a time when few are fixed. We previously established the locus-specific distribution of cervid ERV (CrERV) in populations of mule deer. In this study, we determine the molecular evolutionary processes acting on CrERV at each locus in the context of phylogenetic origin, genome location, and population prevalence. A mule deer genome was de novo assembled from short- and long-insert mate pair reads and CrERV sequence generated at each locus. We report that CrERV composition and diversity have recently measurably increased by horizontal acquisition of a new retrovirus lineage. This new lineage has further expanded CrERV burden and CrERV genomic diversity by activating and recombining with existing CrERV. Resulting interlineage recombinants then endogenize and subsequently expand. CrERV loci are significantly closer to genes than expected if integration were random and gene proximity might explain the recent expansion of one recombinant CrERV lineage. Thus, in mule deer, retroviral colonization is a dynamic period in the molecular evolution of CrERV that also provides a burst of genomic diversity to the host population.
Collapse
Affiliation(s)
- Lei Yang
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Raunaq Malhotra
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Rayan Chikhi
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Daniel Elleder
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, 1083, 14220, Czech Republic Vídeňská Prague
| | - Theodora Kaiser
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Jesse Rong
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Paul Medvedev
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA
| |
Collapse
|
116
|
Genome Assemblies across the Diverse Evolutionary Spectrum of Leishmania Protozoan Parasites. Microbiol Resour Announc 2021; 10:e0054521. [PMID: 34472979 PMCID: PMC8411921 DOI: 10.1128/mra.00545-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We report the high-quality draft assemblies and gene annotations for 13 species and/or strains of the protozoan parasite genera Leishmania, Endotrypanum, and Crithidia, which span the phylogenetic diversity of the subfamily Leishmaniinae within the kinetoplastid order of the phylum Euglenazoa. These resources will support studies on the origins of parasitism.
Collapse
|
117
|
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.
Collapse
|
118
|
Gao X, Mo W, Shi J, Song N, Liang P, Chen J, Shi Y, Guo W, Li X, Yang X, Xin B, Zhao H, Song W, Lai J. HITAC-seq enables high-throughput cost-effective sequencing of plasmids and DNA fragments with identity. J Genet Genomics 2021; 48:671-680. [PMID: 34417123 DOI: 10.1016/j.jgg.2021.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 05/03/2021] [Accepted: 05/13/2021] [Indexed: 01/13/2023]
Abstract
DNA sequencing is vital for many aspects of biological research and diagnostics. Despite the development of second and third generation sequencing technologies, Sanger sequencing has long been the only choice when required to precisely track each sequenced plasmids or DNA fragments. Here, we report a complete set of novel barcoding and assembling system, Highly-parallel Indexed Tagmentation-reads Assembled Consensus sequencing (HITAC-seq), that could massively sequence and track the identities of each individual sequencing sample. With the cost of much less than that of single read of Sanger sequencing, HITAC-seq can generate high-quality contiguous sequences of up to 10 kilobases or longer. The capability of HITAC-seq was confirmed through large-scale sequencing of thousands of plasmid clones and hundreds of amplicon fragments using approximately 100 pg of input DNAs. Due to its long synthetic length, HITAC-seq was effective in detecting relatively large structural variations, as demonstrated by the identification of a ∼1.3 kb Copia retrotransposon insertion in the upstream of a likely maize domestication gene. Besides being a practical alternative to traditional Sanger sequencing, HITAC-seq is suitable for many high-throughput sequencing and genotyping applications.
Collapse
Affiliation(s)
- Xiang Gao
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Weipeng Mo
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Junpeng Shi
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Ning Song
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Pei Liang
- Department of Microbiology and Immunology, College of Biological Sciences, China Agricultural University, Beijing 100193, PR China
| | - Jian Chen
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Yiting Shi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, PR China
| | - Weilong Guo
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, PR China
| | - Xinchen Li
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Xiaohong Yang
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, PR China
| | - Beibei Xin
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Haiming Zhao
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Weibin Song
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
119
|
Li A, Wang J, Sun K, Wang S, Zhao X, Wang T, Xiong L, Xu W, Qiu L, Shang Y, Liu R, Wang S, Lu Y. Two reference-quality sea snake genomes reveal their divergent evolution of adaptive traits and venom systems. Mol Biol Evol 2021; 38:4867-4883. [PMID: 34320652 PMCID: PMC8557462 DOI: 10.1093/molbev/msab212] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
True sea snakes (Hydrophiini) are among the last and most successful clades of vertebrates that show secondary marine adaptation, exhibiting diverse phenotypic traits and lethal venom systems. To better understand their evolution, we generated the first chromosome-level genomes of two representative Hydrophiini snakes, Hydrophis cyanocinctus and H. curtus. Through comparative genomics we identified a great expansion of the underwater olfaction-related V2R gene family, consisting of more than 1,000 copies in both snakes. A series of chromosome rearrangements and genomic structural variations were recognized, including large inversions longer than 30 megabase (Mb) on sex chromosomes which potentially affect key functional genes associated with differentiated phenotypes between the two species. By integrating multiomics we found a significant loss of the major weapon for elapid predation, three-finger toxin genes, which displayed a dosage effect in H. curtus. These genetic changes may imply mechanisms that drove the divergent evolution of adaptive traits including prey preferences between the two closely related snakes. Our reference-quality sea snake genomes also enrich the repositories for addressing important issues on the evolution of marine tetrapods, and provide a resource for discovering marine-derived biological products.
Collapse
Affiliation(s)
- An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China.,School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Junjie Wang
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Kuo Sun
- School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Shuocun Wang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Xin Zhao
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Tingfang Wang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Liyan Xiong
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Weiheng Xu
- School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Lei Qiu
- School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Yan Shang
- Department of Respiratory and Critical Care Medicine, Changhai Hospital, Second Military Medical University, Shanghai, 200433, China
| | - Runhui Liu
- School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Sheng Wang
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Yiming Lu
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China.,School of Pharmacy, Second Military Medical University, Shanghai, 200433, China.,School of Medicine, Shanghai University, Shanghai, 200444, China
| |
Collapse
|
120
|
Vervoort L, Dierckxsens N, Pereboom Z, Capozzi O, Rocchi M, Shaikh TH, Vermeesch JR. 22q11.2 Low Copy Repeats Expanded in the Human Lineage. Front Genet 2021; 12:706641. [PMID: 34335701 PMCID: PMC8320366 DOI: 10.3389/fgene.2021.706641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
Segmental duplications or low copy repeats (LCRs) constitute duplicated regions interspersed in the human genome, currently neglected in standard analyses due to their extreme complexity. Recent functional studies have indicated the potential of genes within LCRs in synaptogenesis, neuronal migration, and neocortical expansion in the human lineage. One of the regions with the highest proportion of duplicated sequence is the 22q11.2 locus, carrying eight LCRs (LCR22-A until LCR22-H), and rearrangements between them cause the 22q11.2 deletion syndrome. The LCR22-A block was recently reported to be hypervariable in the human population. It remains unknown whether this variability also exists in non-human primates, since research is strongly hampered by the presence of sequence gaps in the human and non-human primate reference genomes. To chart the LCR22 haplotypes and the associated inter- and intra-species variability, we de novo assembled the region in non-human primates by a combination of optical mapping techniques. A minimal and likely ancient haplotype is present in the chimpanzee, bonobo, and rhesus monkey without intra-species variation. In addition, the optical maps identified assembly errors and closed gaps in the orthologous chromosome 22 reference sequences. These findings indicate the LCR22 expansion to be unique to the human population, which might indicate involvement of the region in human evolution and adaptation. Those maps will enable LCR22-specific functional studies and investigate potential associations with the phenotypic variability in the 22q11.2 deletion syndrome.
Collapse
Affiliation(s)
| | | | - Zjef Pereboom
- Centre for Research and Conservation, Royal Zoological Society of Antwerp, Antwerp, Belgium
- Evolutionary Ecology Group, Department of Biology, Antwerp University, Antwerp, Belgium
| | | | | | - Tamim H. Shaikh
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, United States
| | | |
Collapse
|
121
|
Akbarinejad S, Hadadian Nejad Yousefi M, Goudarzi M. SVNN: an efficient PacBio-specific pipeline for structural variations calling using neural networks. BMC Bioinformatics 2021; 22:335. [PMID: 34147063 PMCID: PMC8214287 DOI: 10.1186/s12859-021-04184-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
Background Once aligned, long-reads can be a useful source of information to identify the type and position of structural variations. However, due to the high sequencing error of long reads, long-read structural variation detection methods are far from precise in low-coverage cases. To be accurate, they need to use high-coverage data, which in turn, results in an extremely time-consuming pipeline, especially in the alignment phase. Therefore, it is of utmost importance to have a structural variation calling pipeline which is both fast and precise for low-coverage data. Results In this paper, we present SVNN, a fast yet accurate, structural variation calling pipeline for PacBio long-reads that takes raw reads as the input and detects structural variants of size larger than 50 bp. Our pipeline utilizes state-of-the-art long-read aligners, namely NGMLR and Minimap2, and structural variation callers, videlicet Sniffle and SVIM. We found that by using a neural network, we can extract features from Minimap2 output to detect a subset of reads that provide useful information for structural variation detection. By only mapping this subset with NGMLR, which is far slower than Minimap2 but better serves downstream structural variation detection, we can increase the sensitivity in an efficient way. As a result of using multiple tools intelligently, SVNN achieves up to 20 percentage points of sensitivity improvement in comparison with state-of-the-art methods and is three times faster than a naive combination of state-of-the-art tools to achieve almost the same accuracy. Conclusion Since prohibitive costs of using high-coverage data have impeded long-read applications, with SVNN, we provide the users with a much faster structural variation detection platform for PacBio reads with high precision and sensitivity in low-coverage scenarios.
Collapse
Affiliation(s)
- Shaya Akbarinejad
- Department of Computer Engineering, Sharif University of Technology, Azadi Ave., Tehran, Iran
| | | | - Maziar Goudarzi
- Department of Computer Engineering, Sharif University of Technology, Azadi Ave., Tehran, Iran.
| |
Collapse
|
122
|
Khorsand P, Hormozdiari F. Nebula: ultra-efficient mapping-free structural variant genotyper. Nucleic Acids Res 2021; 49:e47. [PMID: 33503255 PMCID: PMC8096284 DOI: 10.1093/nar/gkab025] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/03/2021] [Accepted: 01/11/2021] [Indexed: 11/24/2022] Open
Abstract
Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.
Collapse
Affiliation(s)
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, California, 95616, USA.,UC Davis MIND Institute, Sacramento, California, 95817, USA.,Department of Biochemistry and Molecular Medicine, UC Davis, Sacramento, California, 95817, USA
| |
Collapse
|
123
|
Yin M, Chu S, Shan T, Zha L, Peng H. Full-length transcriptome sequences by a combination of sequencing platforms applied to isoflavonoid and triterpenoid saponin biosynthesis of Astragalus mongholicus Bunge. PLANT METHODS 2021; 17:61. [PMID: 34130711 PMCID: PMC8207730 DOI: 10.1186/s13007-021-00762-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/07/2021] [Indexed: 05/17/2023]
Abstract
BACKGROUND Astragalus mongholicus Bunge is an important medicinal plant used in traditional Chinese medicine. It is rich in isoflavonoids and triterpenoid saponins. Although these active constituents of A. mongholicus have been discovered for a long time, the genetic basis of isoflavonoid and triterpenoid saponin biosynthesis in this plant is virtually unknown because of the lack of a reference genome. Here, we used a combination of next-generation sequencing (NGS) and single-molecule real-time (SMRT) sequencing to identify genes involved in the biosynthetic pathway of secondary metabolites in A. mongholicus. RESULTS In this study, NGS, SMRT sequencing, and targeted compound analysis were combined to investigate the association between isoflavonoid and triterpenoid saponin content, and specific gene expression in the root, stem, and leaves of A. mongholicus. Overall, 643,812 CCS reads were generated, yielding 121,107 non-redundant transcript isoforms with an N50 value of 2124 bp. Based on these highly accurate transcripts, 104,756 (86.50%) transcripts were successfully annotated by any of the seven databases (NR, NT, Swissprot, KEGG, KOG, Pfam and GO). Levels of four isoflavonoids and four astragalosides (triterpenoid saponins) were determined. Forty-four differentially expressed genes (DEGs) involved in isoflavonoid biosynthesis and 44 DEGs from 16 gene families that encode enzymes involved in triterpenoid saponin biosynthesis were identified. Transcription factors (TFs) associated with isoflavonoid and triterpenoid saponin biosynthesis, including 72 MYBs, 53 bHLHs, 64 AP2-EREBPs, and 11 bZIPs, were also identified. The above transcripts showed different expression trends in different plant organs. CONCLUSIONS This study provides important genetic information on the A. mongholicus genes that are essential for isoflavonoid and triterpenoid saponin biosynthesis, and provides a basis for developing the medicinal value of this plant.
Collapse
Affiliation(s)
- Minzhen Yin
- School of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230012, China
- State Key Laboratory of Dao-Di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
- Research Unit of DAO-DI Herbs, Chinese Academy of Medical Sciences, 2019RU57, Beijing, 100700, China
| | - Shanshan Chu
- School of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230012, China
- Anhui Province Key Laboratory of Research & Development of Chinese Medicine, Hefei, 230012, China
| | - Tingyu Shan
- School of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230012, China
| | - Liangping Zha
- School of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230012, China.
- Institute of Conservation and Development of Traditional Chinese Medicine Resources, Anhui Academy of Chinese Medicine, Hefei, 230012, China.
| | - Huasheng Peng
- School of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230012, China.
- State Key Laboratory of Dao-Di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
- Research Unit of DAO-DI Herbs, Chinese Academy of Medical Sciences, 2019RU57, Beijing, 100700, China.
| |
Collapse
|
124
|
Guo M, Li S, Zhou Y, Li M, Wen Z. Comparative Analysis for the Performance of Long-Read-Based Structural Variation Detection Pipelines in Tandem Repeat Regions. Front Pharmacol 2021; 12:658072. [PMID: 34163355 PMCID: PMC8215501 DOI: 10.3389/fphar.2021.658072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/14/2021] [Indexed: 12/04/2022] Open
Abstract
There has been growing recognition of the vital links between structural variations (SVs) and diverse diseases. Research suggests that, with much longer DNA fragments and abundant contextual information, long-read technologies have advantages in SV detection even in complex repetitive regions. So far, several pipelines for calling SVs from long-read sequencing data have been proposed and used in human genome research. However, the performance of these pipelines is still lack of deep exploration and adequate comparison. In this study, we comprehensively evaluated the performance of three commonly used long-read SV detection pipelines, namely PBSV, Sniffles and PBHoney, especially the performance on detecting the SVs in tandem repeat regions (TRRs). Evaluated by using a robust benchmark for germline SV detection as the gold standard, we thoroughly estimated the precision, recall and F1 score of insertions and deletions detected by the pipelines. Our results revealed that all these pipelines clearly exhibited better performance outside TRRs than that in TRRs. The F1 scores of Sniffles in and outside TRRs were 0.60 and 0.76, respectively. The performance of PBSV was similar to that of Sniffles, and was generally higher than that of PBHoney. In conclusion, our findings can be benefit for choosing the appropriate pipelines in real practice and are good complementary to the application of long-read sequencing technologies in the research of rare diseases.
Collapse
Affiliation(s)
- Mingkun Guo
- College of Chemistry, Sichuan University, Chengdu, China
| | - Shihai Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Yifan Zhou
- College of Chemistry, Sichuan University, Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, China.,Medical Big Data Center, Sichuan University, Chengdu, China
| |
Collapse
|
125
|
Morishita S, Ichikawa K, Myers EW. Finding long tandem repeats in long noisy reads. Bioinformatics 2021; 37:612-621. [PMID: 33031558 PMCID: PMC8097686 DOI: 10.1093/bioinformatics/btaa865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/07/2020] [Accepted: 09/23/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10 000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10–20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (<1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity. Availability and implementation https://github.com/morisUtokyo/mTR.
Collapse
Affiliation(s)
- Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany.,Center for Systems Biology Dresden, Dresden, Saxony 01307, Germany
| |
Collapse
|
126
|
Ono Y, Asai K, Hamada M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 2021; 37:589-595. [PMID: 32976553 PMCID: PMC8097687 DOI: 10.1093/bioinformatics/btaa835] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/20/2020] [Accepted: 09/11/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
127
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
128
|
Boti MA, Adamopoulos PG, Tsiakanikas P, Scorilas A. Nanopore Sequencing Unveils Diverse Transcript Variants of the Epithelial Cell-Specific Transcription Factor Elf-3 in Human Malignancies. Genes (Basel) 2021; 12:genes12060839. [PMID: 34072506 PMCID: PMC8227732 DOI: 10.3390/genes12060839] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/25/2021] [Accepted: 05/27/2021] [Indexed: 02/06/2023] Open
Abstract
The human E74-like ETS transcription factor 3 (Elf-3) is an epithelium-specific member of the ETS family, all members of which are characterized by a highly conserved DNA-binding domain. Elf-3 plays a crucial role in epithelial cell differentiation by participating in morphogenesis and terminal differentiation of the murine small intestinal epithelium, and also acts as an indispensable regulator of mesenchymal to epithelial transition, underlying its significant involvement in development and in pathological states, such as cancer. Although previous research works have deciphered the functional role of Elf-3 in normal physiology as well as in tumorigenesis, the present study highlights for the first time the wide spectrum of ELF3 mRNAs that are transcribed, providing an in-depth analysis of splicing events and exon/intron boundaries in a broad panel of human cell lines. The implementation of a versatile targeted nanopore sequencing approach led to the identification of 25 novel ELF3 mRNA transcript variants (ELF3 v.3–v.27) with new alternative splicing events, as well as two novel exons. Although the current study provides a qualitative transcriptional profile regarding ELF3, further studies must be conducted, so the biological function of all novel alternative transcript variants as well as the putative protein isoforms are elucidated.
Collapse
|
129
|
Scatena C, Murtas D, Tomei S. Cutaneous Melanoma Classification: The Importance of High-Throughput Genomic Technologies. Front Oncol 2021; 11:635488. [PMID: 34123788 PMCID: PMC8193952 DOI: 10.3389/fonc.2021.635488] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 03/30/2021] [Indexed: 02/06/2023] Open
Abstract
Cutaneous melanoma is an aggressive tumor responsible for 90% of mortality related to skin cancer. In the recent years, the discovery of driving mutations in melanoma has led to better treatment approaches. The last decade has seen a genomic revolution in the field of cancer. Such genomic revolution has led to the production of an unprecedented mole of data. High-throughput genomic technologies have facilitated the genomic, transcriptomic and epigenomic profiling of several cancers, including melanoma. Nevertheless, there are a number of newer genomic technologies that have not yet been employed in large studies. In this article we describe the current classification of cutaneous melanoma, we review the current knowledge of the main genetic alterations of cutaneous melanoma and their related impact on targeted therapies, and we describe the most recent high-throughput genomic technologies, highlighting their advantages and disadvantages. We hope that the current review will also help scientists to identify the most suitable technology to address melanoma-related relevant questions. The translation of this knowledge and all actual advancements into the clinical practice will be helpful in better defining the different molecular subsets of melanoma patients and provide new tools to address relevant questions on disease management. Genomic technologies might indeed allow to better predict the biological - and, subsequently, clinical - behavior for each subset of melanoma patients as well as to even identify all molecular changes in tumor cell populations during disease evolution toward a real achievement of a personalized medicine.
Collapse
Affiliation(s)
- Cristian Scatena
- Division of Pathology, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Daniela Murtas
- Department of Biomedical Sciences, Section of Cytomorphology, University of Cagliari, Cagliari, Italy
| | - Sara Tomei
- Omics Core, Integrated Genomics Services, Research Department, Sidra Medicine, Doha, Qatar
| |
Collapse
|
130
|
Stielow B, Simon C, Liefke R. Making fundamental scientific discoveries by combining information from literature, databases, and computational tools - An example. Comput Struct Biotechnol J 2021; 19:3027-3033. [PMID: 34136100 PMCID: PMC8175269 DOI: 10.1016/j.csbj.2021.04.052] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/18/2022] Open
Abstract
In recent years, the amount of available literature, data and computational tools has increased exponentially, providing opportunities and challenges to make use of this vast amount of material. Here, we describe how we utilized publicly available information to identify the previously hardly characterized protein SAMD1 (SAM domain-containing protein 1) as a novel unmethylated CpG island-binding protein. This discovery is an example, how the richness of material and tools on the internet can be used to make scientific breakthroughs, but also the hurdles that may occur. Specifically, we discuss how the misrepresentation of SAMD1 in literature and databases may have prevented an earlier characterization of this protein and we address what can be learned from this example.
Collapse
Affiliation(s)
- Bastian Stielow
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
| | - Clara Simon
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
| | - Robert Liefke
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
- Department of Hematology, Oncology and Immunology, University Hospital Giessen and Marburg, 35043 Marburg, Germany
- Corresponding author at: Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany.
| |
Collapse
|
131
|
Abstract
The first gapless, telomere-to-telomere sequence of a human autosome, chromosome 8, is complete. Sequencing and assembly of the corresponding centromere in the chimpanzee, orangutan and macaque reveals details of its rapid evolution over the past 25 million years.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
132
|
Savara J, Novosád T, Gajdoš P, Kriegová E. Comparison of structural variants detected by optical mapping with long-read next-generation sequencing. Bioinformatics 2021; 37:3398-3404. [PMID: 33983367 DOI: 10.1093/bioinformatics/btab359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/21/2021] [Accepted: 05/08/2021] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Recent studies have shown the potential of using long-read whole-genome sequencing (WGS) approaches and optical mapping (OM) for the detection of clinically relevant structural variants (SVs) in cancer research. Three main long-read WGS platforms are currently in use: Pacific Biosciences (PacBio), Oxford Nanopore Technologies (ONT) and 10x Genomics. Recently, whole-genome OM technology (Bionano Genomics) has been introduced into human diagnostics. Questions remain about the accuracy of these long-read sequencing platforms, how comparable/interchangeable they are when searching for SVs and to what extent they can be replaced or supplemented by OM. Moreover, no tool can effectively compare SVs obtained by OM and WGS. RESULTS This study compared optical maps of the breast cancer cell line SKBR3 with AnnotSV outputs from WGS platforms. For this purpose, a software tool with comparative and filtering features was developed. The majority of SVs up to a 50 kbp distance variance threshold found by OM were confirmed by all WGS platforms, and 99% of translocations and 80% of deletions found by OM were confirmed by both PacBio and ONT, with ∼70% being confirmed by 10x Genomics in combination with PacBio and/or ONT. Interestingly, long deletions (>100 kbp) were detected only by 10x Genomics. Regarding insertions, ∼72% was confirmed by PacBio and ONT, but none by 10x Genomics. Inversions and duplications detected by OM were not detected by WGS. Moreover, the tool enabled the confirmation of SVs that overlapped in the same gene(s) and was applied to the filtering of disease-associated SVs. AVAILABILITY https://github.com/novosadt/om-annotsv-svc.
Collapse
Affiliation(s)
- Jakub Savara
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| | - Tomáš Novosád
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Petr Gajdoš
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Eva Kriegová
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| |
Collapse
|
133
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
134
|
Zhao X, Collins RL, Lee WP, Weber AM, Jun Y, Zhu Q, Weisburd B, Huang Y, Audano PA, Wang H, Walker M, Lowther C, Fu J, Gerstein MB, Devine SE, Marschall T, Korbel JO, Eichler EE, Chaisson MJP, Lee C, Mills RE, Brand H, Talkowski ME. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am J Hum Genet 2021; 108:919-928. [PMID: 33789087 PMCID: PMC8206509 DOI: 10.1016/j.ajhg.2021.03.014] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/12/2021] [Indexed: 12/13/2022] Open
Abstract
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Collapse
Affiliation(s)
- Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Wan-Ping Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexandra M Weber
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Yukyung Jun
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Yongqing Huang
- Data Sciences Platform, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Mark Walker
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Mark B Gerstein
- Yale University Medical School, Computational Biology and Bioinformatics Program, New Haven, CT 06520, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Department of Graduate Studies - Life Sciences, Ewha Womans University, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, South Korea; Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an 710061, Shaanxi, People's Republic of China
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
135
|
Kronenberg ZN, Rhie A, Koren S, Concepcion GT, Peluso P, Munson KM, Porubsky D, Kuhn K, Mueller KA, Low WY, Hiendleder S, Fedrigo O, Liachko I, Hall RJ, Phillippy AM, Eichler EE, Williams JL, Smith TPL, Jarvis ED, Sullivan ST, Kingan SB. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat Commun 2021; 12:1935. [PMID: 33911078 PMCID: PMC8081726 DOI: 10.1038/s41467-020-20536-y] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/12/2020] [Indexed: 01/27/2023] Open
Abstract
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Phase Genomics, Seattle, WA, USA.
- Pacific Biosciences, Menlo Park, CA, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kristen Kuhn
- US Meat Animal Research Center, ARS USDA, Clay Center, NE, USA
| | | | - Wai Yee Low
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Olivier Fedrigo
- Vertebrate Genomes Laboratory, The Rockefeller University, New York, NY, USA
| | | | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - John L Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
- Dipartimento di Scienze Animali, della Nutrizione e degli Alimenti, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | | | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | | |
Collapse
|
136
|
Du N, Shang J, Sun Y. Improving protein domain classification for third-generation sequencing reads using deep learning. BMC Genomics 2021; 22:251. [PMID: 33836667 PMCID: PMC8033682 DOI: 10.1186/s12864-021-07468-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 02/19/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND With the development of third-generation sequencing (TGS) technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in TGS data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in TGS data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. RESULTS In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for TGS reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject reads not containing the targeted domains. In the experiments on simulated long reads of protein coding sequences and real TGS reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification. CONCLUSIONS In summary, ProDOMA is a useful end-to-end protein domain analysis tool for long noisy reads without relying on error correction.
Collapse
Affiliation(s)
- Nan Du
- Computer Science and Engineering, Michigan State University, East Lansing, 48824 USA
| | - Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, People’s Republic of China
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, People’s Republic of China
| |
Collapse
|
137
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 316] [Impact Index Per Article: 105.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
138
|
Ranallo-Benavidez TR, Lemmon Z, Soyk S, Aganezov S, Salerno WJ, McCoy RC, Lippman ZB, Schatz MC, Sedlazeck FJ. Optimized sample selection for cost-efficient long-read population sequencing. Genome Res 2021; 31:910-918. [PMID: 33811084 PMCID: PMC8092009 DOI: 10.1101/gr.264879.120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 03/30/2021] [Indexed: 11/24/2022]
Abstract
An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.
Collapse
Affiliation(s)
| | - Zachary Lemmon
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sebastian Soyk
- Center for Integrative Genomics, University of Lausanne, Lausanne 1005, Switzerland
| | | | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Rajiv C McCoy
- Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Michael C Schatz
- Johns Hopkins University, Baltimore, Maryland 21218, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
139
|
Pauper M, Kucuk E, Wenger AM, Chakraborty S, Baybayan P, Kwint M, van der Sanden B, Nelen MR, Derks R, Brunner HG, Hoischen A, Vissers LELM, Gilissen C. Long-read trio sequencing of individuals with unsolved intellectual disability. Eur J Hum Genet 2021; 29:637-648. [PMID: 33257779 PMCID: PMC8115091 DOI: 10.1038/s41431-020-00770-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 10/27/2020] [Indexed: 02/06/2023] Open
Abstract
Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.
Collapse
Affiliation(s)
- Marc Pauper
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Erdi Kucuk
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
| | | | | | | | - Michael Kwint
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Bart van der Sanden
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 HR, Nijmegen, The Netherlands
| | - Marcel R Nelen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Han G Brunner
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Department of Internal Medicine, Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 HR, Nijmegen, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands.
| |
Collapse
|
140
|
Macken WL, Vandrovcova J, Hanna MG, Pitceathly RDS. Applying genomic and transcriptomic advances to mitochondrial medicine. Nat Rev Neurol 2021; 17:215-230. [PMID: 33623159 DOI: 10.1038/s41582-021-00455-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2021] [Indexed: 02/07/2023]
Abstract
Next-generation sequencing (NGS) has increased our understanding of the molecular basis of many primary mitochondrial diseases (PMDs). Despite this progress, many patients with suspected PMD remain without a genetic diagnosis, which restricts their access to in-depth genetic counselling, reproductive options and clinical trials, in addition to hampering efforts to understand the underlying disease mechanisms. Although they represent a considerable improvement over their predecessors, current methods for sequencing the mitochondrial and nuclear genomes have important limitations, and molecular diagnostic techniques are often manual and time consuming. However, recent advances in genomics and transcriptomics offer realistic solutions to these challenges. In this Review, we discuss the current genetic testing approach for PMDs and the opportunities that exist for increased use of whole-genome NGS of nuclear and mitochondrial DNA (mtDNA) in the clinical environment. We consider the possible role for long-read approaches in sequencing of mtDNA and in the identification of novel nuclear genomic causes of PMDs. We examine the expanding applications of RNA sequencing, including the detection of cryptic variants that affect splicing and gene expression and the interpretation of rare and novel mitochondrial transfer RNA variants.
Collapse
Affiliation(s)
- William L Macken
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Michael G Hanna
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Robert D S Pitceathly
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK.
| |
Collapse
|
141
|
Blom MPK. Opportunities and challenges for high-quality biodiversity tissue archives in the age of long-read sequencing. Mol Ecol 2021; 30:5935-5948. [PMID: 33786900 DOI: 10.1111/mec.15909] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/06/2021] [Accepted: 03/22/2021] [Indexed: 12/11/2022]
Abstract
The technological ability to characterize genetic variation at a genome-wide scale provides an unprecedented opportunity to study the genetic underpinnings and evolutionary mechanisms that promote and sustain biodiversity. The transition from short- to long-read sequencing is particularly promising and allows a more holistic view on any changes in genetic diversity across time and space. Long-read sequencing has tremendous potential but sequencing success strongly depends on the long-range integrity of DNA molecules and therefore on the availability of high-quality tissue samples. With the scope of genomic experiments expanding and wild populations simultaneously disappearing at an unprecedented rate, access to high-quality samples may soon be a major concern for many projects. The need for high-quality biodiversity tissue archives is therefore urgent but sampling and preserving high-quality samples is not a trivial exercise. In this review, I will briefly outline how long-read sequencing can benefit the study of molecular ecology, how this will substantially increase the demand for high-quality tissues and why it is challenging to preserve DNA integrity. I will then provide an overview of preservation approaches and end with a call for support to acknowledge the efforts needed to assemble high-quality tissue archives. In doing so, I hope to simultaneously motivate field biologists to expand sampling practices and molecular biologists to develop (cost) efficient guidelines for the sampling and long-term storage of tissues. A concerted, interdisciplinary, effort is needed to catalogue the genetic variation underlying contemporary biodiversity and will eventually provide a critical resource for future studies.
Collapse
Affiliation(s)
- Mozes P K Blom
- Leibniz Institut für Evolutions- und Biodiversitätsforschung, Museum für Naturkunde, Berlin, Germany
| |
Collapse
|
142
|
Liu Q, Liaquat F, He Y, Munis MFH, Zhang C. Functional Annotation of a Full-Length Transcriptome and Identification of Genes Associated with Flower Development in Rhododendronsimsii (Ericaceae). PLANTS (BASEL, SWITZERLAND) 2021; 10:649. [PMID: 33805478 PMCID: PMC8065783 DOI: 10.3390/plants10040649] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 03/21/2021] [Accepted: 03/24/2021] [Indexed: 11/16/2022]
Abstract
Rhododendronsimsii is one of the top ten famous flowers in China. Due to its historical value and high aesthetic, it is widely popular among Chinese people. Various colors are important breeding objectives in Rhododendron L. The understanding of the molecular mechanism of flower color formation can provide a theoretical basis for the improvement of flower color in Rhododendron L. To generate the R.simsii transcriptome, PacBio sequencing technology has been used. A total of 833,137 full-length non-chimeric reads were obtained and 726,846 high-quality full-length transcripts were found. Moreover, 40,556 total open reading frames were obtained; of which 36,018 were complete. In gene annotation analyses, 39,411, 18,565, 16,102 and 17,450 transcriptions were allocated to GO, Nr, KEGG and COG databases, correspondingly. To identify long non-coding RNAs (lncRNAs), we utilized four computational methods associated with Protein families (Pfam), Cooperative Data Classification (CPC), Coding Assessing Potential Tool (CPAT) and Coding Non Coding Index (CNCI) databases and observed 6170, 2265, 4084 and 1240 lncRNAs, respectively. Based on the results, most genes were enriched in the flavonoid biosynthetic pathway. The eight key genes on the anthocyanin biosynthetic pathway were further selected and analyzed by qRT-PCR. The F3'H and ANS showed an upward trend in the developmental stages of R. simsii. The highest expression of F3'5'H and FLS in the petal color formation of R. simsii was observed. This research provided a huge number of full-length transcripts, which will help to proceed genetic analyses of R.simsii. native, which is a semi-deciduous shrub.
Collapse
Affiliation(s)
- Qunlu Liu
- Department of Landscape Architecture, School of Design, Shanghai Jiao Tong University, Shanghai 200240, China; (Q.L.); (Y.H.)
| | - Fiza Liaquat
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China;
| | - Yefeng He
- Department of Landscape Architecture, School of Design, Shanghai Jiao Tong University, Shanghai 200240, China; (Q.L.); (Y.H.)
| | | | - Chunying Zhang
- Shanghai Engineering Research Center of Sustainable Plant Innovation, Shanghai Botanical Garden, Shanghai 200231, China
| |
Collapse
|
143
|
Gusic M, Prokisch H. Genetic basis of mitochondrial diseases. FEBS Lett 2021; 595:1132-1158. [PMID: 33655490 DOI: 10.1002/1873-3468.14068] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/17/2021] [Accepted: 02/18/2021] [Indexed: 12/13/2022]
Abstract
Mitochondrial disorders are monogenic disorders characterized by a defect in oxidative phosphorylation and caused by pathogenic variants in one of over 340 different genes. The implementation of whole-exome sequencing has led to a revolution in their diagnosis, duplicated the number of associated disease genes, and significantly increased the diagnosed fraction. However, the genetic etiology of a substantial fraction of patients exhibiting mitochondrial disorders remains unknown, highlighting limitations in variant detection and interpretation, which calls for improved computational and DNA sequencing methods, as well as the addition of OMICS tools. More intriguingly, this also suggests that some pathogenic variants lie outside of the protein-coding genes and that the mechanisms beyond the Mendelian inheritance and the mtDNA are of relevance. This review covers the current status of the genetic basis of mitochondrial diseases, discusses current challenges and perspectives, and explores the contribution of factors beyond the protein-coding regions and monogenic inheritance in the expansion of the genetic spectrum of disease.
Collapse
Affiliation(s)
- Mirjana Gusic
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany.,Institute of Human Genetics, Technical University of Munich, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Germany
| | - Holger Prokisch
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany.,Institute of Human Genetics, Technical University of Munich, Germany
| |
Collapse
|
144
|
de Bruijn SE, Fadaie Z, Cremers FPM, Kremer H, Roosing S. The Impact of Modern Technologies on Molecular Diagnostic Success Rates, with a Focus on Inherited Retinal Dystrophy and Hearing Loss. Int J Mol Sci 2021; 22:2943. [PMID: 33799353 PMCID: PMC7998853 DOI: 10.3390/ijms22062943] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/04/2021] [Accepted: 03/09/2021] [Indexed: 02/07/2023] Open
Abstract
The identification of pathogenic variants in monogenic diseases has been of interest to researchers and clinicians for several decades. However, for inherited diseases with extremely high genetic heterogeneity, such as hearing loss and retinal dystrophies, establishing a molecular diagnosis requires an enormous effort. In this review, we use these two genetic conditions as examples to describe the initial molecular genetic identification approaches, as performed since the early 90s, and subsequent improvements and refinements introduced over the years. Next, the history of DNA sequencing from conventional Sanger sequencing to high-throughput massive parallel sequencing, a.k.a. next-generation sequencing, is outlined, including their advantages and limitations and their impact on identifying the remaining genetic defects. Moreover, the development of recent technologies, also coined "third-generation" sequencing, is reviewed, which holds the promise to overcome these limitations. Furthermore, we outline the importance and complexity of variant interpretation in clinical diagnostic settings concerning the massive number of different variants identified by these methods. Finally, we briefly mention the development of novel approaches such as optical mapping and multiomics, which can help to further identify genetic defects in the near future.
Collapse
Affiliation(s)
- Suzanne E. de Bruijn
- Department of Human Genetics, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands; (S.E.d.B.); (Z.F.); (F.P.M.C.)
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands;
| | - Zeinab Fadaie
- Department of Human Genetics, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands; (S.E.d.B.); (Z.F.); (F.P.M.C.)
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands;
| | - Frans P. M. Cremers
- Department of Human Genetics, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands; (S.E.d.B.); (Z.F.); (F.P.M.C.)
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands;
| | - Hannie Kremer
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands;
- Department of Otorhinolaryngology, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands
| | - Susanne Roosing
- Department of Human Genetics, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands; (S.E.d.B.); (Z.F.); (F.P.M.C.)
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands;
| |
Collapse
|
145
|
Kumar A, Adhikari S, Kankainen M, Heckman CA. Comparison of Structural and Short Variants Detected by Linked-Read and Whole-Exome Sequencing in Multiple Myeloma. Cancers (Basel) 2021; 13:1212. [PMID: 33802025 PMCID: PMC7999337 DOI: 10.3390/cancers13061212] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/07/2021] [Accepted: 03/08/2021] [Indexed: 02/07/2023] Open
Abstract
Linked-read sequencing was developed to aid the detection of large structural variants (SVs) from short-read sequencing efforts. We performed a systematic evaluation to determine if linked-read exome sequencing provides more comprehensive and clinically relevant information than whole-exome sequencing (WES) when applied to the same set of multiple myeloma patient samples. We report that linked-read sequencing detected a higher number of SVs (n = 18,455) than WES (n = 4065). However, linked-read predictions were dominated by inversions (92.4%), leading to poor detection of other types of SVs. In contrast, WES detected 56.3% deletions, 32.6% insertions, 6.7% translocations, 3.3% duplications and 1.2% inversions. Surprisingly, the quantitative performance assessment suggested a higher performance for WES (AUC = 0.791) compared to linked-read sequencing (AUC = 0.766) for detecting clinically validated cytogenetic alterations. We also found that linked-read sequencing detected more short variants (n = 704) compared to WES (n = 109). WES detected somatic mutations in all MM-related genes while linked-read sequencing failed to detect certain mutations. The comparison of somatic mutations detected using linked-read, WES and RNA-seq revealed that WES and RNA-seq detected more mutations than linked-read sequencing. These data indicate that WES outperforms and is more efficient than linked-read sequencing for detecting clinically relevant SVs and MM-specific short variants.
Collapse
Affiliation(s)
- Ashwini Kumar
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Sadiksha Adhikari
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Matti Kankainen
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
- Medical and Clinical Genetics, University of Helsinki, Helsinki University Hospital, 00029 Helsinki, Finland
- Translational Immunology Research Program and Department of Clinical Chemistry, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
| | - Caroline A. Heckman
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| |
Collapse
|
146
|
Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res 2021; 31:380-396. [PMID: 33563718 PMCID: PMC7919458 DOI: 10.1101/gr.263442.120] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 12/28/2020] [Indexed: 12/25/2022]
Abstract
The rapid evolution of repetitive DNA sequences, including satellite DNA, tandem duplications, and transposable elements, underlies phenotypic evolution and contributes to hybrid incompatibilities between species. However, repetitive genomic regions are fragmented and misassembled in most contemporary genome assemblies. We generated highly contiguous de novo reference genomes for the Drosophila simulans species complex (D. simulans, D. mauritiana, and D. sechellia), which speciated ∼250,000 yr ago. Our assemblies are comparable in contiguity and accuracy to the current D. melanogaster genome, allowing us to directly compare repetitive sequences between these four species. We find that at least 15% of the D. simulans complex species genomes fail to align uniquely to D. melanogaster owing to structural divergence-twice the number of single-nucleotide substitutions. We also find rapid turnover of satellite DNA and extensive structural divergence in heterochromatic regions, whereas the euchromatic gene content is mostly conserved. Despite the overall preservation of gene synteny, euchromatin in each species has been shaped by clade- and species-specific inversions, transposable elements, expansions and contractions of satellite and tRNA tandem arrays, and gene duplications. We also find rapid divergence among Y-linked genes, including copy number variation and recent gene duplications from autosomes. Our assemblies provide a valuable resource for studying genome evolution and its consequences for phenotypic evolution in these genetic model species.
Collapse
Affiliation(s)
- Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Ching-Ho Chang
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Danielle E Khost
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
- FAS Informatics and Scientific Applications, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Jeffrey Vedanayagam
- Department of Developmental Biology, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA
| | - Jeffrey R Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Kristi L Montooth
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | - Colin D Meiklejohn
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | | | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| |
Collapse
|
147
|
Porubsky D, Ebert P, Audano PA, Vollger MR, Harvey WT, Marijon P, Ebler J, Munson KM, Sorensen M, Sulovari A, Haukness M, Ghareghani M, Lansdorp PM, Paten B, Devine SE, Sanders AD, Lee C, Chaisson MJP, Korbel JO, Eichler EE, Marschall T. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat Biotechnol 2021; 39:302-308. [PMID: 33288906 PMCID: PMC7954704 DOI: 10.1038/s41587-020-0719-5] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 09/16/2020] [Indexed: 12/18/2022]
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pierre Marijon
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Jana Ebler
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Maryam Ghareghani
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
- Center for Bioinformatics, Saarland University, and Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Department of Life Science, Ewha Womans University, Seoul, Republic of Korea
| | - Mark J P Chaisson
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany.
| |
Collapse
|
148
|
Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform 2021; 22:6149347. [PMID: 33634311 DOI: 10.1093/bib/bbab033] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Yawei Wei
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
149
|
Minoche AE, Lundie B, Peters GB, Ohnesorg T, Pinese M, Thomas DM, Zankl A, Roscioli T, Schonrock N, Kummerfeld S, Burnett L, Dinger ME, Cowley MJ. ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data. Genome Med 2021; 13:32. [PMID: 33632298 PMCID: PMC7908648 DOI: 10.1186/s13073-021-00841-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 02/02/2021] [Indexed: 01/09/2023] Open
Abstract
Whole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritization, and visualization framework, which identified 99.8% of simulated pathogenic ClinVar CNVs > 10 kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5-4.5%) and reproducibility high (95-99%). In clinical practice, ClinSV identified reportable variants in 22 of 485 patients (4.7%) of which 35-63% were not detectable by current clinical microarray designs. ClinSV is available at https://github.com/KCCG/ClinSV .
Collapse
Affiliation(s)
- Andre E Minoche
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
| | - Ben Lundie
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Greg B Peters
- Sydney Genome Diagnostics, The Children's Hospital at Westmead, Hawkesbury Road & Hainsworth Street, Westmead, NSW, Australia
| | - Thomas Ohnesorg
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia
| | - David M Thomas
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- The Kinghorn Cancer Centre and Cancer Division, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Andreas Zankl
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Department of Clinical Genetics, The Children's Hospital at Westmead, Hawkesbury Road, Westmead, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Tony Roscioli
- NSW Health Pathology Randwick, Sydney, NSW, Australia
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia
- Prince of Wales Clinical School, University of New South Wales, Sydney, NSW, Australia
- Neuroscience Research Australia, University of New South Wales, Randwick, Sydney, NSW, Australia
| | - Nicole Schonrock
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Sarah Kummerfeld
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
| | - Leslie Burnett
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Marcel E Dinger
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW, Australia
| | - Mark J Cowley
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia.
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia.
| |
Collapse
|
150
|
Abstract
Since the human genome was published in 2001, many of the gaps in the original sequence have been filled in, offering a more detailed understanding of genome regulation, structure and function.
Collapse
|