1
|
Thorburn DMJ, Sagonas K, Binzer-Panchal M, Chain FJJ, Feulner PGD, Bornberg-Bauer E, Reusch TBH, Samonte-Padilla IE, Milinski M, Lenz TL, Eizaguirre C. Origin matters: Using a local reference genome improves measures in population genomics. Mol Ecol Resour 2023; 23:1706-1723. [PMID: 37489282 DOI: 10.1111/1755-0998.13838] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 05/10/2023] [Accepted: 06/02/2023] [Indexed: 07/26/2023]
Abstract
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for mapping population-level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. π, Tajima's D and FST ), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
Collapse
Affiliation(s)
- Doko-Miles J Thorburn
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Kostas Sagonas
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Zoology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Mahesh Binzer-Panchal
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden (NBIS), Uppsala University, Uppsala, Sweden
| | - Frederic J J Chain
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell, Massachusetts, USA
| | - Philine G D Feulner
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry, EAWAG Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
- Division of Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Thorsten B H Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Irene E Samonte-Padilla
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Manfred Milinski
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Tobias L Lenz
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Christophe Eizaguirre
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
2
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
3
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
4
|
Van Der Merwe N, Ramesar R, De Vries J. Whole Exome Sequencing in South Africa: Stakeholder Views on Return of Individual Research Results and Incidental Findings. Front Genet 2022; 13:864822. [PMID: 35754817 PMCID: PMC9216214 DOI: 10.3389/fgene.2022.864822] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/30/2022] [Indexed: 11/17/2022] Open
Abstract
The use of whole exome sequencing (WES) in medical research is increasing in South Africa (SA), raising important questions about whether and which individual genetic research results, particularly incidental findings, should be returned to patients. Whilst some commentaries and opinions related to the topic have been published in SA, there is no qualitative data on the views of professional stakeholders on this topic. Seventeen participants including clinicians, genomics researchers, and genetic counsellors (GCs) were recruited from the Western Cape in SA. Semi-structured interviews were conducted, and the transcripts analysed using the framework approach for data analysis. Current roadblocks for the clinical adoption of WES in SA include a lack of standardised guidelines; complexities relating to variant interpretation due to lack of functional studies and underrepresentation of people of African ancestry in the reference genome, population and variant databases; lack of resources and skilled personnel for variant confirmation and follow-up. Suggestions to overcome these barriers include obtaining funding and buy-in from the private and public sectors and medical insurance companies; the generation of a locally relevant reference genome; training of health professionals in the field of genomics and bioinformatics; and multidisciplinary collaboration. Participants emphasised the importance of upscaling the accessibility to and training of GCs, as well as upskilling of clinicians and genetic nurses for return of genetic data in collaboration with GCs and medical geneticists. Future research could focus on exploring the development of stakeholder partnerships for increased access to trained specialists as well as community engagement and education, alongside the development of guidelines for result disclosure.
Collapse
Affiliation(s)
- Nicole Van Der Merwe
- UCT/MRC Genomic and Precision Medicine Research Unit, Division of Human Genetics, Institute for Infectious Diseases and Molecular Medicine, Department of Pathology, Faculty of Medicine and Health Sciences, University of Cape Town, Cape Town, South Africa.,Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Raj Ramesar
- UCT/MRC Genomic and Precision Medicine Research Unit, Division of Human Genetics, Institute for Infectious Diseases and Molecular Medicine, Department of Pathology, Faculty of Medicine and Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jantina De Vries
- Department of Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Neuroscience Institute, Faculty of Health Sciences, University of Cape Town, Observatory, South Africa
| |
Collapse
|
5
|
Reimagining India’s Health System: Technology Levers for Universal Health Care. J Indian Inst Sci 2022; 102:743-752. [PMID: 36093275 PMCID: PMC9449281 DOI: 10.1007/s41745-022-00326-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 07/04/2022] [Indexed: 12/03/2022]
Abstract
Just as the COVID-19 pandemic highlighted the inadequacies of our current health systems and rekindled the debate around universal health care, the Lancet Citizens’ Commission on Reimagining India’s Health System was launched in late 2020. As a part of the commission, we articulated how technology can enable universal health care. We begin by stating the foundational values—a set of normative statements—that should underpin the use of technology in our health systems. Then, after summarising the paradigm shifts necessary to achieve citizen-centred universal health care, we articulate five ‘technology levers’ to enable those shifts. Finally, we describe the intersections and synergies between technology and the other pillars of health systems, namely, human resources, financing, governance and citizens’ engagement.
Collapse
|
6
|
Kaminow B, Ballouz S, Gillis J, Dobin A. Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses. Genome Res 2022; 32:738-749. [PMID: 35256454 PMCID: PMC8997357 DOI: 10.1101/gr.275613.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 03/02/2022] [Indexed: 11/25/2022]
Abstract
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the pan-human, super-population, and population levels, utilizing variant information from the 1000 Genomes Project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ~2-3 when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase overusing the pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. Replacing reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
Collapse
Affiliation(s)
- Benjamin Kaminow
- Cold Spring Harbor Laboratory; Weill Cornell Graduate School of Medical Sciences
| | - Sara Ballouz
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research; School of Medical Sciences, University of New South Wales; Cold Spring Harbor Laboratory
| | | | | |
Collapse
|
7
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
AbstractRare diseases affect 30 million people in the USA and more than 300–400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25–35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
|
8
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
9
|
Whirl-Carrillo M, Huddart R, Gong L, Sangkuhl K, Thorn CF, Whaley R, Klein TE. An Evidence-Based Framework for Evaluating Pharmacogenomics Knowledge for Personalized Medicine. Clin Pharmacol Ther 2021; 110:563-572. [PMID: 34216021 PMCID: PMC8457105 DOI: 10.1002/cpt.2350] [Citation(s) in RCA: 265] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 06/16/2021] [Indexed: 11/23/2022]
Abstract
Clinical annotations are one of the most popular resources available on the Pharmacogenomics Knowledgebase (PharmGKB). Each clinical annotation summarizes the association between variant‐drug pairs, shows relevant findings from the curated literature, and is assigned a level of evidence (LOE) to indicate the strength of support for that association. Evidence from the pharmacogenomic literature is curated into PharmGKB as variant annotations, which can be used to create new clinical annotations or added to existing clinical annotations. This means that the same clinical annotation can be worked on by multiple curators over time. As more evidence is curated into PharmGKB, the task of maintaining consistency when assessing all the available evidence and assigning an LOE becomes increasingly difficult. To remedy this, a scoring system has been developed to automate LOE assignment to clinical annotations. Variant annotations are scored according to certain attributes, including study size, reported P value, and whether the variant annotation supports or fails to find an association. Clinical guidelines or US Food and Drug Administration (FDA)‐approved drug labels which give variant‐specific prescribing guidance are also scored. The scores of all annotations attached to a clinical annotation are summed together to give a total score for the clinical annotation, which is used to calculate an LOE. Overall, the system increases transparency, consistency, and reproducibility in LOE assignment to clinical annotations. In combination with increased standardization of how clinical annotations are written, use of this scoring system helps to ensure that PharmGKB clinical annotations continue to be a robust source of pharmacogenomic information.
Collapse
Affiliation(s)
- Michelle Whirl-Carrillo
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Rachel Huddart
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Li Gong
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Katrin Sangkuhl
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Caroline F Thorn
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Ryan Whaley
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Teri E Klein
- Department of Biomedical Data Science and Biomedical Informatics Research, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
10
|
Mun T, Chen NC, Langmead B. LevioSAM: Fast lift-over of variant-aware reference alignments. Bioinformatics 2021; 37:4243-4245. [PMID: 34037690 PMCID: PMC9502237 DOI: 10.1093/bioinformatics/btab396] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/31/2021] [Accepted: 05/24/2021] [Indexed: 01/12/2023] Open
Abstract
Motivation As more population genetics datasets and population-specific references become available, the task of translating (‘lifting’) read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck. Results LevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM is more than 7 times faster than an aligner when both are run with 16 threads. Availability and implementation Software Package: https://github.com/alshai/levioSAM, Experiments: https://github.com/langmead-lab/levioSAM-experiments Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, USA
| |
Collapse
|
11
|
Takayama J, Tadaka S, Yano K, Katsuoka F, Gocho C, Funayama T, Makino S, Okamura Y, Kikuchi A, Sugimoto S, Kawashima J, Otsuki A, Sakurai-Yageta M, Yasuda J, Kure S, Kinoshita K, Yamamoto M, Tamiya G. Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference. Nat Commun 2021; 12:226. [PMID: 33431880 PMCID: PMC7801658 DOI: 10.1038/s41467-020-20146-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 11/17/2020] [Indexed: 12/21/2022] Open
Abstract
The complete human genome sequence is used as a reference for next-generation sequencing analyses. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to its bias toward European and African ancestries. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population.
Collapse
Affiliation(s)
- Jun Takayama
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Shu Tadaka
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Kenji Yano
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Fumiki Katsuoka
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Chinatsu Gocho
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Takamitsu Funayama
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Satoshi Makino
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yasunobu Okamura
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Atsuo Kikuchi
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Sachiyo Sugimoto
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Junko Kawashima
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Akihito Otsuki
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Mika Sakurai-Yageta
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Jun Yasuda
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Division of Molecular and Cellular Oncology, Miyagi Cancer Center Research Institute, 47-1, Nodayama, Medeshima-Shiode, Natori, Miyagi, 981-1293, Japan
| | - Shigeo Kure
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Kengo Kinoshita
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki Aza-Aoba, Aoba-ku, Sendai, Miyagi, 980-8579, Japan.
| | - Masayuki Yamamoto
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
| | - Gen Tamiya
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
- Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| |
Collapse
|
12
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021. [PMID: 33397413 DOI: 10.1101/2020.03.03.975219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
13
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021; 22:8. [PMID: 33397413 PMCID: PMC7780692 DOI: 10.1186/s13059-020-02229-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 12/08/2020] [Indexed: 12/30/2022] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
14
|
Fang J, Pian C, Xu M, Kong L, Li Z, Ji J, Zhang L, Chen Y. Revealing Prognosis-Related Pathways at the Individual Level by a Comprehensive Analysis of Different Cancer Transcription Data. Genes (Basel) 2020; 11:genes11111281. [PMID: 33138076 PMCID: PMC7692404 DOI: 10.3390/genes11111281] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/26/2020] [Accepted: 10/26/2020] [Indexed: 02/07/2023] Open
Abstract
Identifying perturbed pathways at an individual level is important to discover the causes of cancer and develop individualized custom therapeutic strategies. Though prognostic gene lists have had success in prognosis prediction, using single genes that are related to the relevant system or specific network cannot fully reveal the process of tumorigenesis. We hypothesize that in individual samples, the disruption of transcription homeostasis can influence the occurrence, development, and metastasis of tumors and has implications for patient survival outcomes. Here, we introduced the individual-level pathway score, which can measure the correlation perturbation of the pathways in a single sample well. We applied this method to the expression data of 16 different cancer types from The Cancer Genome Atlas (TCGA) database. Our results indicate that different cancer types as well as their tumor-adjacent tissues can be clearly distinguished by the individual-level pathway score. Additionally, we found that there was strong heterogeneity among different cancer types and the percentage of perturbed pathways as well as the perturbation proportions of tumor samples in each pathway were significantly different. Finally, the prognosis-related pathways of different cancer types were obtained by survival analysis. We demonstrated that the individual-level pathway score (iPS) is capable of classifying cancer types and identifying some key prognosis-related pathways.
Collapse
Affiliation(s)
- Jingya Fang
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
| | - Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing 210095, China;
| | - Mingmin Xu
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
| | - Lingpeng Kong
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
| | - Zutan Li
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
| | - Jinwen Ji
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
| | - Liangyun Zhang
- College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China; (J.F.); (M.X.); (L.K.); (Z.L.); (J.J.)
- Correspondence: (L.Z.); (Y.C.)
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing 210095, China;
- Correspondence: (L.Z.); (Y.C.)
| |
Collapse
|
15
|
Crysnanto D, Pausch H. Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery. Genome Biol 2020; 21:184. [PMID: 32718320 PMCID: PMC7385871 DOI: 10.1186/s13059-020-02105-0%0a%0a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/14/2020] [Indexed: 09/28/2023] Open
Abstract
BACKGROUND The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
Collapse
|
16
|
Crysnanto D, Pausch H. Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery. Genome Biol 2020; 21:184. [PMID: 32718320 PMCID: PMC7385871 DOI: 10.1186/s13059-020-02105-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/14/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
Collapse
|
17
|
Al-Khawaga S, Mohammed I, Saraswathi S, Haris B, Hasnah R, Saeed A, Almabrazi H, Syed N, Jithesh P, El Awwa A, Khalifa A, AlKhalaf F, Petrovski G, Abdelalim EM, Hussain K. The clinical and genetic characteristics of permanent neonatal diabetes (PNDM) in the state of Qatar. Mol Genet Genomic Med 2019; 7:e00753. [PMID: 31441606 PMCID: PMC6785445 DOI: 10.1002/mgg3.753] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/04/2019] [Accepted: 04/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background Neonatal diabetes mellitus (NDM) is a rare condition that occurs within the first six months of life. Permanent NDM (PNDM) is caused by mutations in specific genes that are known for their expression at early and/or late stages of pancreatic beta‐ cell development, and are either involved in beta‐cell survival, insulin processing, regulation, and release. The native population in Qatar continues to practice consanguineous marriages that lead to a high level of homozygosity. To our knowledge, there is no previous report on the genomics of NDM among the Qatari population. The aims of the current study are to identify patients with NDM diagnosed between 2001 and 2016, and examine their clinical and genetic characteristics. Methods To calculate the incidence of PNDM, all patients with PNDM diagnosed between 2001 and 2016 were compared to the total number of live births over the 16‐year‐period. Whole Genome Sequencing (WGS) was used to investigate the genetic etiology in the PNDM cohort. Results PNDM was diagnosed in nine (n = 9) patients with an estimated incidence rate of 1:22,938 live births among the indigenous Qatari. Seven different mutations in six genes (PTF1A, GCK, SLC2A2, EIF2AK3, INS, and HNF1B) were identified. In the majority of cases, the genetic etiology was part of a previously identified autosomal recessive disorder. Two novel de novo mutations were identified in INS and HNF1B. Conclusion Qatar has the second highest reported incidence of PNDM worldwide. A majority of PNDM cases present as rare familial autosomal recessive disorders. Pancreas associated transcription factor 1a (PTF1A) enhancer deletions are the most common cause of PNDM in Qatar, with only a few previous cases reported in the literature.
Collapse
Affiliation(s)
- Sara Al-Khawaga
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar.,Diabetes Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Idris Mohammed
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Saras Saraswathi
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Basma Haris
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Reem Hasnah
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Amira Saeed
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | | | - Najeeb Syed
- Biomedical Informatics Division, Sidra Medicine, Doha, Qatar
| | - Puthen Jithesh
- Biomedical Informatics Division, Sidra Medicine, Doha, Qatar
| | - Ahmed El Awwa
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar.,Faculty of medicine, Alexandria University, Alexandria, Egypt
| | - Amal Khalifa
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Fawziya AlKhalaf
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Goran Petrovski
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Essam M Abdelalim
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Diabetes Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Khalid Hussain
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| |
Collapse
|
18
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
19
|
Shukla HG, Bawa PS, Srinivasan S. hg19KIndel: ethnicity normalized human reference genome. BMC Genomics 2019; 20:459. [PMID: 31170919 PMCID: PMC6555027 DOI: 10.1186/s12864-019-5854-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 05/29/2019] [Indexed: 11/22/2022] Open
Abstract
Background The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. Results We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. Conclusion We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at 10.5281/zenodo.2638113
Collapse
Affiliation(s)
- Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India
| | - Pushpinder Singh Bawa
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India.,Manipal Academy of Higher Education (MAHE), Manipal, India
| | - Subhashini Srinivasan
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India.
| |
Collapse
|
20
|
Maroilley T, Tarailo-Graovac M. Uncovering Missing Heritability in Rare Diseases. Genes (Basel) 2019; 10:E275. [PMID: 30987386 PMCID: PMC6523881 DOI: 10.3390/genes10040275] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 03/29/2019] [Accepted: 04/01/2019] [Indexed: 12/14/2022] Open
Abstract
The problem of 'missing heritability' affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The 'missing heritability' concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the 'missing heritability' problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.
Collapse
Affiliation(s)
- Tatiana Maroilley
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| |
Collapse
|
21
|
Identification of an iron-responsive subtype in two children diagnosed with relapsing-remitting multiple sclerosis using whole exome sequencing. Mol Genet Metab Rep 2019; 19:100465. [PMID: 30963028 PMCID: PMC6434495 DOI: 10.1016/j.ymgmr.2019.100465] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 03/16/2019] [Accepted: 03/16/2019] [Indexed: 12/26/2022] Open
Abstract
Background Multiple sclerosis is a disorder related to demyelination of axons. Iron is an essential cofactor in myelin synthesis. Previously, we described two children (males of mixed ancestry) with relapsing-remitting multiple sclerosis (RRMS) where long-term remission was achieved by regular iron supplementation. A genetic defect in iron metabolism was postulated, suggesting that more advanced genetic studies could shed new light on disease pathophysiology related to iron. Methods Whole exome sequencing (WES) was performed to identify causal pathways. Blood tests were performed over a 10 year period to monitor the long-term effect of a supplementation regimen. Clinical wellbeing was assessed quarterly by a pediatric neurologist and regular feedback was obtained from the schoolteachers. Results WES revealed gene variants involved in iron absorption and transport, in the transmembrane protease, serine 6 (TMPRSS6) and transferrin (TF) genes; multiple genetic variants in CUBN, which encodes cubilin (a receptor involved in the absorption of vitamin B12 as well as the reabsorption of transferrin-bound iron and vitamin D in the kidneys); SLC25A37 (involved in iron transport into mitochondria) and CD163 (a scavenger receptor involved in hemorrhage resolution). Variants were also found in COQ3, involved with synthesis of Coenzyme Q10 in mitochondria. Neither of the children had the HLA-DRB1*1501 allele associated with increased genetic risk for MS, suggesting that the genetic contribution of iron-related genetic variants may be instrumental in childhood MS. In both children the RRMS has remained stable without activity over the last 10 years since initiation of nutritional supplementation and maintenance of normal iron levels, confirming the role of iron deficiency in disease pathogenesis in these patients. Conclusion Our findings highlight the potential value of WES to identify heritable risk factors that could affect the reabsorption of transferrin-bound iron in the kidneys causing sustained iron loss, together with inhibition of vitamin B12 absorption and vitamin D reabsorption (CUBN) and iron transport into mitochondria (SLC25A37) as the sole site of heme synthesis. This supports a model for RRMS in children with an apparent iron-deficient biochemical subtype of MS, with oligodendrocyte cell death and impaired myelination possibly caused by deficits of energy- and antioxidant capacity in mitochondria.
Collapse
Key Words
- CNS, central nervous system
- CoQ, Coenzyme Q
- DFO, desferroxamine mesylate
- DIS, dissemination in space
- DIT, dissemination in time
- DMT, disease modifying therapy
- EDSS, Expanded Disability Status Scale
- ETC, electron transport chain
- GWAS, genome-wide association study
- Genetic variants
- HDL, high density lipoprotein
- HERV-W, human endogenous retrovirus W
- HLA, human leukocyte antigen
- HREC, human research ethics committee
- IPMSSG, International Pediatric Multiple Sclerosis Study Group
- IRE, iron-response element
- Iron deficiency
- MGA1, juvenile hereditary megaloblastic anemia 1
- MRI, magnetic resonance imaging
- MS, Multiple sclerosis
- MSRV, MS-associated retrovirus
- MST1R, macrophage stimulating-1 receptor
- Mitochondria
- Oxidative stress
- PSGT, pathology supported genetic testing
- Pediatric onset multiple sclerosis
- ROS, reactive oxygen species
- RRMS, relapsing-remitting MS
- SAMe, S-adenosyl methionine
- SDHB, iron-protein subunit of Complex II
- TF, transferrin
- TMPRSS6, transmembrane protease, serine 6
- WES, whole exome sequencing
- Whole exome sequencing
Collapse
|
22
|
Pritt J, Chen NC, Langmead B. FORGe: prioritizing variants for graph genomes. Genome Biol 2018; 19:220. [PMID: 30558649 PMCID: PMC6296055 DOI: 10.1186/s13059-018-1595-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Accepted: 11/26/2018] [Indexed: 12/30/2022] Open
Abstract
There is growing interest in using genetic variants to augment the reference genome into a graph genome, with alternative sequences, to improve read alignment accuracy and reduce allelic bias. While adding a variant has the positive effect of removing an undesirable alignment score penalty, it also increases both the ambiguity of the reference genome and the cost of storing and querying the genome index. We introduce methods and a software tool called FORGe for modeling these effects and prioritizing variants accordingly. We show that FORGe enables a range of advantageous and measurable trade-offs between accuracy and computational overhead.
Collapse
Affiliation(s)
- Jacob Pritt
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA. .,Center for Computational Biology, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
23
|
De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 2018; 9:3040. [PMID: 30072691 PMCID: PMC6072799 DOI: 10.1038/s41467-018-05513-w] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/11/2018] [Indexed: 12/20/2022] Open
Abstract
The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) “Linked-Read” technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Furthermore, 37% of the NUIs can be found in the human transcriptome and 14% likely arose from Alu-recombination-mediated deletion. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations. The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.
Collapse
|
24
|
Genome Sequencing in Hypertrophic Cardiomyopathy. J Am Coll Cardiol 2018; 72:430-433. [DOI: 10.1016/j.jacc.2018.05.029] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 05/24/2018] [Accepted: 05/24/2018] [Indexed: 11/20/2022]
|
25
|
Abstract
Advances in omics technologies - such as genomics, transcriptomics, proteomics and metabolomics - have begun to enable personalized medicine at an extraordinarily detailed molecular level. Individually, these technologies have contributed medical advances that have begun to enter clinical practice. However, each technology individually cannot capture the entire biological complexity of most human diseases. Integration of multiple technologies has emerged as an approach to provide a more comprehensive view of biology and disease. In this Review, we discuss the potential for combining diverse types of data and the utility of this approach in human health and disease. We provide examples of data integration to understand, diagnose and inform treatment of diseases, including rare and common diseases as well as cancer and transplant biology. Finally, we discuss technical and other challenges to clinical implementation of integrative omics.
Collapse
Affiliation(s)
- Konrad J Karczewski
- Massachusetts General Hospital, Boston, MA, USA
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
26
|
Abstract
In a Perspective, Joshua Knowles and Euan Ashley discuss the potential for use of genetic risk scores in clinical practice.
Collapse
Affiliation(s)
- Joshua W. Knowles
- Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
| | - Euan A. Ashley
- Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
27
|
Kalayinia S, Goodarzynejad H, Maleki M, Mahdieh N. Next generation sequencing applications for cardiovascular disease. Ann Med 2018; 50:91-109. [PMID: 29027470 DOI: 10.1080/07853890.2017.1392595] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
The Human Genome Project (HGP), as the primary sequencing of the human genome, lasted more than one decade to be completed using the traditional Sanger's method. At present, next-generation sequencing (NGS) technology could provide the genome sequence data in hours. NGS has also decreased the expense of sequencing; therefore, nowadays it is possible to carry out both whole-genome (WGS) and whole-exome sequencing (WES) for the variations detection in patients with rare genetic diseases as well as complex disorders such as common cardiovascular diseases (CVDs). Finding new variants may contribute to establishing a risk profile for the pathology process of diseases. Here, recent applications of NGS in cardiovascular medicine are discussed; both Mendelian disorders of the cardiovascular system and complex genetic CVDs including inherited cardiomyopathy, channelopathies, stroke, coronary artery disease (CAD) and are considered. We also state some future use of NGS in clinical practice for increasing our information about the CVDs genetics and the limitations of this new technology. Key messages Traditional Sanger's method was the mainstay for Human Genome Project (HGP); Sanger sequencing has high fidelity but is slow and costly as compared to next generation methods. Within cardiovascular medicine, NGS has been shown to be successful in identifying novel causative mutations and in the diagnosis of Mendelian diseases which are caused by a single variant in a single gene. NGS has provided the opportunity to perform parallel analysis of a great number of genes in an unbiased approach (i.e. without knowing the underlying biological mechanism) which probably contribute to advance our knowledge regarding the pathology of complex diseases such as CVD.
Collapse
Affiliation(s)
- Samira Kalayinia
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| | | | - Majid Maleki
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| | - Nejat Mahdieh
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| |
Collapse
|
28
|
Amor DJ, Kerr A, Somanathan N, McEwen A, Tome M, Hodgson J, Lewis S. Attitudes of sperm, egg and embryo donors and recipients towards genetic information and screening of donors. Reprod Health 2018; 15:26. [PMID: 29426347 PMCID: PMC5807856 DOI: 10.1186/s12978-018-0468-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 01/29/2018] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Gamete and embryo donors undergo genetic screening procedures in order to maximise the health of donor-conceived offspring. In the era of genomic medicine, expanded genetic screening may be offered to donors for the purpose of avoiding transmission of harmful genetic mutations. The objective of this study was to explore the attitudes of donors and recipients toward the expanded genetic screening of donors. METHODS Qualitative interview study with thematic analysis, undertaken in a tertiary fertility centre. Semi-structured in-depth qualitative interviews were conducted with eleven recipients and nine donors from three different cohorts (sperm, egg and embryo donors/recipients). RESULTS Donors and recipients acknowledged the importance of genetic information and were comfortable with the existing level of genetic screening of donors. Recipients recognised some potential benefits of expanded genetic screening of donors; however both recipients and donors were apprehensive about extended genomic technologies, with concerns about how this information would be used and the ethics of genetic selectivity. CONCLUSION Participants in donor programs support some level of genetic screening of donors, but are wary of expanding genetic screening beyond current levels.
Collapse
Affiliation(s)
- David J Amor
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Australia. .,Department of Paediatrics, The University of Melbourne, Parkville, Australia. .,Melbourne IVF, East Melbourne, Australia.
| | - Annabelle Kerr
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Australia
| | - Nandini Somanathan
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Australia
| | - Alison McEwen
- Graduate School of Health, University of Technology, Sydney, Australia
| | | | - Jan Hodgson
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Australia
| | - Sharon Lewis
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
29
|
Koko M, Abdallah MOE, Amin M, Ibrahim M. Challenges imposed by minor reference alleles on the identification and reporting of clinical variants from exome data. BMC Genomics 2018; 19:46. [PMID: 29334895 PMCID: PMC5769444 DOI: 10.1186/s12864-018-4433-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 01/03/2018] [Indexed: 12/30/2022] Open
Abstract
Background The conventional variant calling of pathogenic alleles in exome and genome sequencing requires the presence of the non-pathogenic alleles as genome references. This hinders the correct identification of variants with minor and/or pathogenic reference alleles warranting additional approaches for variant calling. Results More than 26,000 Exome Aggregation Consortium (ExAC) variants have a minor reference allele including variants with known ClinVar disease alleles. For instance, in a number of variants related to clotting disorders, the phenotype-associated allele is a human genome reference allele (rs6025, rs6003, rs1799983, and rs2227564 using the assembly hg19). We highlighted how the current variant calling standards miss homozygous reference disease variants in these sites and provided a bioinformatic panel that can be used to screen these variants using commonly available variant callers. We present exome sequencing results from an individual with venous thrombosis to emphasize how pathogenic alleles in clinically relevant variants escape variant calling while non-pathogenic alleles are detected. Conclusions This article highlights the importance of specialized variant calling strategies in clinical variants with minor reference alleles especially in the context of personal genomes and exomes. We provide here a simple strategy to screen potential disease-causing variants when present in homozygous reference state. Electronic supplementary material The online version of this article (10.1186/s12864-018-4433-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahmoud Koko
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, P. O. Box 102, Army Road, 11111, Khartoum, Sudan. .,Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, Tübingen, Germany.
| | - Mohammed O E Abdallah
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, P. O. Box 102, Army Road, 11111, Khartoum, Sudan
| | - Mutaz Amin
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, P. O. Box 102, Army Road, 11111, Khartoum, Sudan.,Department of Biochemistry, Faculty of Medicine, University of Khartoum, Khartoum, Sudan
| | - Muntaser Ibrahim
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, P. O. Box 102, Army Road, 11111, Khartoum, Sudan.
| |
Collapse
|
30
|
Altman RB, Prabhu S, Sidow A, Zook JM, Goldfeder R, Litwack D, Ashley E, Asimenos G, Bustamante CD, Donigan K, Giacomini KM, Johansen E, Khuri N, Lee E, Liang XS, Salit M, Serang O, Tezak Z, Wall DP, Mansfield E, Kass-Hout T. A research roadmap for next-generation sequencing informatics. Sci Transl Med 2017; 8:335ps10. [PMID: 27099173 DOI: 10.1126/scitranslmed.aaf7314] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Next-generation sequencing technologies are fueling a wave of new diagnostic tests. Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively these diagnostics to the clinic.
Collapse
Affiliation(s)
- Russ B Altman
- Bioengineering, Genetics, and Medicine, Stanford University, Stanford, CA 94305, USA.
| | - Snehit Prabhu
- Biomedical Data Science and Genetics, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Pathology and Genetics, Stanford University, Stanford, CA 94305, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA. Material Measurement Laboratory, National Institute of Standards and Technology, Stanford University, Stanford, CA 94305, USA
| | - Rachel Goldfeder
- Biomedical Informatics, Stanford University, Stanford, CA 94305, USA
| | - David Litwack
- Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Euan Ashley
- Medicine, Genetics, and Pathology, Stanford University, Stanford, CA 94305, USA
| | | | - Carlos D Bustamante
- Biomedical Data Science and Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Kathleen M Giacomini
- Bioengineering and Therapeutic Sciences, University of California at San Francisco, San Francisco, CA 94143, USA
| | | | - Natalia Khuri
- Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Eunice Lee
- Food and Drug Administration, Silver Spring, MD 20993, USA
| | | | - Marc Salit
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA. Bioengineering, Stanford University, Stanford, CA 94305, USA. Material Measurement Laboratory, National Institute of Standards and Technology, Stanford University, Stanford, CA 94305, USA
| | | | - Zivana Tezak
- Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Dennis P Wall
- Systems Medicine and Psychiatry, Stanford University, Stanford, CA 94305, USA
| | | | - Taha Kass-Hout
- Food and Drug Administration, Silver Spring, MD 20993, USA
| |
Collapse
|
31
|
Ochoa I, Hernaez M, Goldfeder R, Weissman T, Ashley E. Effect of lossy compression of quality scores on variant calling. Brief Bioinform 2017; 18:183-194. [PMID: 26966283 DOI: 10.1093/bib/bbw011] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Indexed: 12/30/2022] Open
Abstract
Recent advancements in sequencing technology have led to a drastic reduction in genome sequencing costs. This development has generated an unprecedented amount of data that must be stored, processed, and communicated. To facilitate this effort, compression of genomic files has been proposed. Specifically, lossy compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of lossy compression, in terms of variant discovery, is unclear.Bioinformatic algorithms to identify SNPs and INDELs use base quality score information; here, we evaluate the effect of lossy compression of quality scores on SNP and INDEL detection. Specifically, we investigate how the output of the variant caller when using the original data differs from that obtained when quality scores are replaced by those generated by a lossy compressor. Using gold standard genomic datasets and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that lossy compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data. Further, in some cases lossy compression can lead to variant calling performance that is superior to that using the original file. We envisage our findings and framework serving as a benchmark in future development and analyses of lossy genomic data compressors.
Collapse
Affiliation(s)
- Idoia Ochoa
- Electrical Engineering department, 350 Serra Mall, Stanford, CA, USA
| | - Mikel Hernaez
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Rachel Goldfeder
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Euan Ashley
- Department of Medicine, Stanford University, Stanford, CA, USA.,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
32
|
Personalized medicine-a modern approach for the diagnosis and management of hypertension. Clin Sci (Lond) 2017; 131:2671-2685. [PMID: 29109301 PMCID: PMC5736921 DOI: 10.1042/cs20160407] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 09/22/2017] [Accepted: 09/25/2017] [Indexed: 12/15/2022]
Abstract
The main goal of treating hypertension is to reduce blood pressure to physiological levels and thereby prevent risk of cardiovascular disease and hypertension-associated target organ damage. Despite reductions in major risk factors and the availability of a plethora of effective antihypertensive drugs, the control of blood pressure to target values is still poor due to multiple factors including apparent drug resistance and lack of adherence. An explanation for this problem is related to the current reductionist and ‘trial-and-error’ approach in the management of hypertension, as we may oversimplify the complex nature of the disease and not pay enough attention to the heterogeneity of the pathophysiology and clinical presentation of the disorder. Taking into account specific risk factors, genetic phenotype, pharmacokinetic characteristics, and other particular features unique to each patient, would allow a personalized approach to managing the disease. Personalized medicine therefore represents the tailoring of medical approach and treatment to the individual characteristics of each patient and is expected to become the paradigm of future healthcare. The advancement of systems biology research and the rapid development of high-throughput technologies, as well as the characterization of different –omics, have contributed to a shift in modern biological and medical research from traditional hypothesis-driven designs toward data-driven studies and have facilitated the evolution of personalized or precision medicine for chronic diseases such as hypertension.
Collapse
|
33
|
Catching hidden variation: systematic correction of reference minor allele annotation in clinical variant calling. Genet Med 2017; 20:360-364. [DOI: 10.1038/gim.2017.168] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/18/2017] [Indexed: 12/21/2022] Open
|
34
|
Goldfeder RL, Wall DP, Khoury MJ, Ioannidis JPA, Ashley EA. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis. Am J Epidemiol 2017; 186:1000-1009. [PMID: 29040395 DOI: 10.1093/aje/kww224] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/02/2016] [Indexed: 12/30/2022] Open
Abstract
Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Collapse
|
35
|
Abstract
There is great potential for genome sequencing to enhance patient care through improved diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential, genomics strategies that have been developed for genetic discovery - including DNA-sequencing technologies and analysis algorithms - need to be adapted to fit clinical needs. This will require the optimization of alignment algorithms, attention to quality-coverage metrics, tailored solutions for paralogous or low-complexity areas of the genome, and the adoption of consensus standards for variant calling and interpretation. Global sharing of this more accurate genotypic and phenotypic data will accelerate the determination of causality for novel genes or variants. Thus, a deeper understanding of disease will be realized that will allow its targeting with much greater therapeutic precision.
Collapse
Affiliation(s)
- Euan A Ashley
- Center for Inherited Cardiovascular Disease, Falk Cardiovascular Research Building, Stanford Medicine, 870 Quarry Road, Stanford, California 94305, USA
| |
Collapse
|
36
|
Sweet K, Sturm AC, Schmidlen T, McElroy J, Scheinfeldt L, Manickam K, Gordon ES, Hovick S, Scott Roberts J, Toland AE, Christman M. Outcomes of a Randomized Controlled Trial of Genomic Counseling for Patients Receiving Personalized and Actionable Complex Disease Reports. J Genet Couns 2017; 26:980-998. [PMID: 28345121 DOI: 10.1007/s10897-017-0073-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 01/18/2017] [Indexed: 12/25/2022]
Abstract
There has been very limited study of patients with chronic disease receiving potentially actionable genomic based results or the utilization of genetic counselors in the online result delivery process. We conducted a randomized controlled trial on 199 patients with chronic disease each receiving eight personalized and actionable complex disease reports online. Primary study aims were to assess the impact of in-person genomic counseling on 1) causal attribution of disease risk, 2) personal awareness of disease risk, and 3) perceived risk of developing a particular disease. Of 98 intervention arm participants (mean age = 57.8; 39% female) randomized for in-person genomic counseling, 76 (78%) were seen. In contrast, control arm participants (n = 101; mean age = 58.5; 54% female) were initially not offered genomic counseling as part of the study protocol but were able to access in-person genomic counseling, if they requested it, 3-months post viewing of at least one test report and post-completion of the study-specific follow-up survey. A total of 64 intervention arm and 59 control arm participants completed follow-up survey measures. We found that participants receiving in-person genomic counseling had enhanced objective understanding of the genetic variant risk contribution for multiple complex diseases. Genomic counseling was associated with lowered participant causal beliefs in genetic influence across all eight diseases, compared to control participants. Our findings also illustrate that for the majority of diseases under study, intervention arm participants believed they knew their genetic risk status better than control arm subjects. Disease risk was modified for the majority during genomic counseling, due to the assessment of more comprehensive family history. In conclusion, for patients receiving personalized and actionable genomic results through a web portal, genomic counseling enhanced their objective understanding of the genetic variant risk contribution to multiple common diseases. These results support the development of additional genomic counseling interventions to ensure a high level of patient comprehension and improve patient-centered health outcomes.
Collapse
Affiliation(s)
- Kevin Sweet
- Division of Human Genetics, Ohio State University Wexner Medical Center, Columbus, OH, 43420, USA.
- Division of Human Genetics, Ohio State University, 2001 Polaris Parkway, Columbus, OH, 43212, USA.
| | - Amy C Sturm
- Division of Human Genetics, Ohio State University Wexner Medical Center, Columbus, OH, 43420, USA
- Dorothy M. Davis Heart and Lung Research Institute, Ohio State University Wexner Medical Center, Columbus, OH, 43420, USA
| | - Tara Schmidlen
- Coriell Institute for Medical Research, 403 Haddon Avenue, Camden, NJ, 08103, USA
| | - Joseph McElroy
- Department of Biomedical Informatics, Center for Biostatistics, Columbus, OH, 43221, USA
| | - Laura Scheinfeldt
- Coriell Institute for Medical Research, 403 Haddon Avenue, Camden, NJ, 08103, USA
- Temple University, SERC Building, 1925 N. 12th St, Philadelphia, PA, 19122-1801, USA
| | - Kandamurugu Manickam
- Geisinger Health System, Genomic Medicine Institute, Precision Health Center, 190 Welles Street, Suite 128, Forty Fort, PA, 18704, USA
| | - Erynn S Gordon
- Coriell Institute for Medical Research, 403 Haddon Avenue, Camden, NJ, 08103, USA
- Genome Medical, Monterey, CA, 93940, USA
| | - Shelly Hovick
- School of Communication, Ohio State University, Columbus, OH, 43214, USA
| | - J Scott Roberts
- Department of Health Behavior & Health Education, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Amanda Ewart Toland
- Division of Human Genetics, Ohio State University Wexner Medical Center, Columbus, OH, 43420, USA
| | - Michael Christman
- Coriell Institute for Medical Research, 403 Haddon Avenue, Camden, NJ, 08103, USA
| |
Collapse
|
37
|
Evaluating the Calling Performance of a Rare Disease NGS Panel for Single Nucleotide and Copy Number Variants. Mol Diagn Ther 2017; 21:303-313. [DOI: 10.1007/s40291-017-0268-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
38
|
van der Merwe N, Peeters AV, Pienaar FM, Bezuidenhout J, van Rensburg SJ, Kotze MJ. Exome Sequencing in a Family with Luminal-Type Breast Cancer Underpinned by Variation in the Methylation Pathway. Int J Mol Sci 2017; 18:E467. [PMID: 28241424 PMCID: PMC5343999 DOI: 10.3390/ijms18020467] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Revised: 01/31/2017] [Accepted: 02/10/2017] [Indexed: 01/19/2023] Open
Abstract
Panel-based next generation sequencing (NGS) is currently preferred over whole exome sequencing (WES) for diagnosis of familial breast cancer, due to interpretation challenges caused by variants of uncertain clinical significance (VUS). There is also no consensus on the selection criteria for WES. In this study, a pathology-supported genetic testing (PSGT) approach was used to select two BRCA1/2 mutation-negative breast cancer patients from the same family for WES. Homozygosity for the MTHFR 677 C>T mutation detected during this PSGT pre-screen step was considered insufficient to cause bilateral breast cancer in the index case and her daughter diagnosed with early-onset breast cancer (<30 years). Extended genetic testing using WES identified the RAD50 R385C missense mutation in both cases. This rare variant with a minor allele frequency (MAF) of <0.001 was classified as a VUS after exclusion in an affected cousin and extended genotyping in 164 unrelated breast cancer patients and 160 controls. Detection of functional polymorphisms (MAF > 5%) in the folate pathway in all three affected family members is consistent with inheritance of the luminal-type breast cancer in the family. PSGT assisted with the decision to pursue extended genetic testing and facilitated clinical interpretation of WES aimed at reduction of recurrence risk.
Collapse
Affiliation(s)
- Nicole van der Merwe
- Division of Anatomical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg 7500, South Africa.
| | - Armand V Peeters
- Division of Anatomical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg 7500, South Africa.
| | | | - Juanita Bezuidenhout
- Division of Anatomical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg 7500, South Africa.
| | - Susan J van Rensburg
- Division of Chemical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg 7500, South Africa.
| | - Maritha J Kotze
- Division of Chemical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg 7500, South Africa.
- National Health Laboratory Service, Tygerberg Hospital, Tygerberg 7500, South Africa.
| |
Collapse
|
39
|
Mias GI, Yusufaly T, Roushangar R, Brooks LRK, Singh VV, Christou C. MathIOmica: An Integrative Platform for Dynamic Omics. Sci Rep 2016; 6:37237. [PMID: 27883025 PMCID: PMC5121649 DOI: 10.1038/srep37237] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 10/25/2016] [Indexed: 12/13/2022] Open
Abstract
Multiple omics data are rapidly becoming available, necessitating the use of new methods to integrate different technologies and interpret the results arising from multimodal assaying. The MathIOmica package for Mathematica provides one of the first extensive introductions to the use of the Wolfram Language to tackle such problems in bioinformatics. The package particularly addresses the necessity to integrate multiple omics information arising from dynamic profiling in a personalized medicine approach. It provides multiple tools to facilitate bioinformatics analysis, including importing data, annotating datasets, tracking missing values, normalizing data, clustering and visualizing the classification of data, carrying out annotation and enumeration of ontology memberships and pathway analysis. We anticipate MathIOmica to not only help in the creation of new bioinformatics tools, but also in promoting interdisciplinary investigations, particularly from researchers in mathematical, physical science and engineering fields transitioning into genomics, bioinformatics and omics data integration.
Collapse
Affiliation(s)
- George I. Mias
- Michigan State University, Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Tahir Yusufaly
- University of Southern California, Department of Physics and Astronomy, Los Angeles, CA, 90089, USA
| | - Raeuf Roushangar
- Michigan State University, Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Lavida R. K. Brooks
- Michigan State University, Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Vikas V. Singh
- Michigan State University, Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Christina Christou
- Mercy Cancer Center, Department of Radiation Oncology, Mason City, IA 50401, USA
| |
Collapse
|
40
|
An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes. Nat Commun 2016; 7:13637. [PMID: 27882922 PMCID: PMC5123046 DOI: 10.1038/ncomms13637] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 10/18/2016] [Indexed: 12/20/2022] Open
Abstract
Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.
Collapse
|
41
|
Karthikeyan S, Bawa PS, Srinivasan S. hg19K: addressing a significant lacuna in hg19-based variant calling. Mol Genet Genomic Med 2016; 5:15-20. [PMID: 28116326 PMCID: PMC5241214 DOI: 10.1002/mgg3.251] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/29/2016] [Accepted: 09/05/2016] [Indexed: 11/09/2022] Open
Abstract
Background The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. Method We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. Results Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. Conclusion We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species.
Collapse
Affiliation(s)
- Savita Karthikeyan
- Institute of Bioinformatics and Applied Biotechnology Biotech Park, Electronic City Phase I Bangalore 560100 India
| | - Pushpinder S Bawa
- Institute of Bioinformatics and Applied Biotechnology Biotech Park, Electronic City Phase I Bangalore 560100 India
| | - Subhashini Srinivasan
- Institute of Bioinformatics and Applied Biotechnology Biotech Park, Electronic City Phase I Bangalore 560100 India
| |
Collapse
|
42
|
The Qatar genome: a population-specific tool for precision medicine in the Middle East. Hum Genome Var 2016; 3:16016. [PMID: 27408750 PMCID: PMC4927697 DOI: 10.1038/hgv.2016.16] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Revised: 03/09/2016] [Accepted: 04/11/2016] [Indexed: 12/30/2022] Open
Abstract
Reaching the full potential of precision medicine depends on the quality of personalized genome interpretation. In order to facilitate precision medicine in regions of the Middle East and North Africa (MENA), a population-specific genome for the indigenous Arab population of Qatar (QTRG) was constructed by incorporating allele frequency data from sequencing of 1,161 Qataris, representing 0.4% of the population. A total of 20.9 million single nucleotide polymorphisms (SNPs) and 3.1 million indels were observed in Qatar, including an average of 1.79% novel variants per individual genome. Replacement of the GRCh37 standard reference with QTRG in a best practices genome analysis workflow resulted in an average of 7* deeper coverage depth (an improvement of 23%) and 756,671 fewer variants on average, a reduction of 16% that is attributed to common Qatari alleles being present in QTRG. The benefit for using QTRG varies across ancestries, a factor that should be taken into consideration when selecting an appropriate reference for analysis.
Collapse
|
43
|
Linderman MD, Nielsen DE, Green RC. Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium. J Pers Med 2016; 6:E14. [PMID: 27023617 PMCID: PMC4932461 DOI: 10.3390/jpm6020014] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 03/09/2016] [Accepted: 03/15/2016] [Indexed: 12/16/2022] Open
Abstract
Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Daiva E Nielsen
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
- Harvard Medical School, Boston, MA 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Robert C Green
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
- Harvard Medical School, Boston, MA 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
- Partners Personalized Medicine, Cambridge, MA 02139, USA.
| |
Collapse
|
44
|
Goldfeder RL, Priest JR, Zook JM, Grove ME, Waggott D, Wheeler MT, Salit M, Ashley EA. Medical implications of technical accuracy in genome sequencing. Genome Med 2016; 8:24. [PMID: 26932475 PMCID: PMC4774017 DOI: 10.1186/s13073-016-0269-0] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 01/21/2016] [Indexed: 12/31/2022] Open
Abstract
Background As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome. Methods We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation. Results We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6 % of the exonic bases in ClinVar and OMIM genes and 82.1 % of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50 % of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2. Conclusions Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0269-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rachel L Goldfeder
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - James R Priest
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA. .,Department of Pediatrics, Stanford University, Stanford, CA, 94305, USA.
| | - Justin M Zook
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Megan E Grove
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Daryl Waggott
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Matthew T Wheeler
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Marc Salit
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Euan A Ashley
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA. .,Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
45
|
Human genetic variation database, a reference database of genetic variations in the Japanese population. J Hum Genet 2016; 61:547-53. [PMID: 26911352 PMCID: PMC4931044 DOI: 10.1038/jhg.2016.12] [Citation(s) in RCA: 229] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 01/20/2016] [Accepted: 01/21/2016] [Indexed: 12/13/2022]
Abstract
Whole-genome and -exome resequencing using next-generation sequencers is a powerful approach for identifying genomic variations that are associated with diseases. However, systematic strategies for prioritizing causative variants from many candidates to explain the disease phenotype are still far from being established, because the population-specific frequency spectrum of genetic variation has not been characterized. Here, we have collected exomic genetic variation from 1208 Japanese individuals through a collaborative effort, and aggregated the data into a prevailing catalog. In total, we identified 156 622 previously unreported variants. The allele frequencies for the majority (88.8%) were lower than 0.5% in allele frequency and predicted to be functionally deleterious. In addition, we have constructed a Japanese-specific major allele reference genome by which the number of unique mapping of the short reads in our data has increased 0.045% on average. Our results illustrate the importance of constructing an ethnicity-specific reference genome for identifying rare variants. All the collected data were centralized to a newly developed database to serve as useful resources for exploring pathogenic variations. Public access to the database is available at http://www.genome.med.kyoto-u.ac.jp/SnpDB/.
Collapse
|
46
|
Dewey FE, Grove ME, Priest JR, Waggott D, Batra P, Miller CL, Wheeler M, Zia A, Pan C, Karzcewski KJ, Miyake C, Whirl-Carrillo M, Klein TE, Datta S, Altman RB, Snyder M, Quertermous T, Ashley EA. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data. PLoS Genet 2015; 11:e1005496. [PMID: 26448358 PMCID: PMC4598191 DOI: 10.1371/journal.pgen.1005496] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 08/13/2015] [Indexed: 12/11/2022] Open
Abstract
High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework. Technological advances have dramatically reduced the cost of sequencing the human genome. Tools for analyzing such data across families including annotation of clinically important variants and aggregation of variants for personalizing drug prescriptions have been developed but few are publically available. Here we describe such tools then demonstrate their application in several distinct data sets. In particular, we use the tools to define the genetic basis of a new congenital arrhythmia syndrome.
Collapse
Affiliation(s)
- Frederick E. Dewey
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Megan E. Grove
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - James R. Priest
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Pediatric Cardiology, Stanford University, Stanford, California, United States of America
| | - Daryl Waggott
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
| | - Prag Batra
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
| | - Clint L. Miller
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Matthew Wheeler
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Amin Zia
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Cuiping Pan
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Konrad J. Karzcewski
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America
| | - Christina Miyake
- Division of Pediatric Cardiology, Stanford University, Stanford, California, United States of America
| | | | - Teri E. Klein
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Somalee Datta
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, United States of America
| | - Russ B. Altman
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Michael Snyder
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Thomas Quertermous
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Euan A. Ashley
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, California, United States of America
- Stanford Cardiovascular Institute, Stanford University, Stanford, California, United States of America
- Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
47
|
Reinert K, Langmead B, Weese D, Evers DJ. Alignment of Next-Generation Sequencing Reads. Annu Rev Genomics Hum Genet 2015; 16:133-51. [DOI: 10.1146/annurev-genom-090413-025358] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; ,
| | - Ben Langmead
- Department of Computer Science and Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21218;
| | - David Weese
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; ,
| | | |
Collapse
|
48
|
Yuan S, Johnston HR, Zhang G, Li Y, Hu YJ, Qin ZS. One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies. PLoS Comput Biol 2015; 11:e1004448. [PMID: 26267278 PMCID: PMC4534450 DOI: 10.1371/journal.pcbi.1004448] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 07/13/2015] [Indexed: 12/13/2022] Open
Abstract
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
Collapse
Affiliation(s)
- Shuai Yuan
- Mathematics & Computer Science Department, Emory University, Atlanta, Georgia, United States of America
| | - H. Richard Johnston
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
| | - Guosheng Zhang
- Department of Genetics, Department of Biostatistics, Department of Computer Science, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Genetics, Department of Biostatistics, Department of Computer Science, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
49
|
Royer-Bertrand B, Rivolta C. Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer. Cell Mol Life Sci 2015; 72:1463-71. [PMID: 25548800 PMCID: PMC11113357 DOI: 10.1007/s00018-014-1807-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Revised: 12/12/2014] [Accepted: 12/15/2014] [Indexed: 12/17/2022]
Abstract
The past decade has seen the emergence of next-generation sequencing (NGS) technologies, which have revolutionized the field of human molecular genetics. With NGS, significant portions of the human genome can now be assessed by direct sequence analysis, highlighting normal and pathological variants of our DNA. Recent advances have also allowed the sequencing of complete genomes, by a method referred to as whole genome sequencing (WGS). In this work, we review the use of WGS in medical genetics, with specific emphasis on the benefits and the disadvantages of this technique for detecting genomic alterations leading to Mendelian human diseases and to cancer.
Collapse
Affiliation(s)
- Beryl Royer-Bertrand
- Department of Medical Genetics, University of Lausanne, Rue Du Bugnon 27, 1005 Lausanne, Switzerland
| | - Carlo Rivolta
- Department of Medical Genetics, University of Lausanne, Rue Du Bugnon 27, 1005 Lausanne, Switzerland
| |
Collapse
|
50
|
Affiliation(s)
- David J Amor
- Royal Children's Hospital, Murdoch Childrens Research Institute, Melbourne, Victoria, Australia; Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|