1
|
Sun S, Cheng F, Han D, Wei S, Zhong A, Massoudian S, Johnson AB. Pairwise comparative analysis of six haplotype assembly methods based on users' experience. BMC Genom Data 2023; 24:35. [PMID: 37386408 PMCID: PMC10311811 DOI: 10.1186/s12863-023-01134-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 05/25/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. RESULT Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms' run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. CONCLUSION The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Mathematics, Texas State University, San Marcos, TX USA
| | - Flora Cheng
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Daphne Han
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Sarah Wei
- Massachusetts Institute of Technology, Cambridge, MA USA
| | | | | | | |
Collapse
|
2
|
Zhang T, Zhou J, Gao W, Jia Y, Wei Y, Wang G. Complex genome assembly based on long-read sequencing. Brief Bioinform 2022; 23:6657663. [PMID: 35940845 DOI: 10.1093/bib/bbac305] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/20/2022] [Accepted: 07/06/2022] [Indexed: 11/12/2022] Open
Abstract
High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jie Zhou
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yanan Wei
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| |
Collapse
|
3
|
Markowski J, Kempfer R, Kukalev A, Irastorza-Azcarate I, Loof G, Kehr B, Pombo A, Rahmann S, Schwarz RF. GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. Bioinformatics 2021; 37:3128-3135. [PMID: 33830196 PMCID: PMC8504635 DOI: 10.1093/bioinformatics/btab238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 02/22/2021] [Accepted: 04/07/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM's ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. RESULTS We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. AVAILABILITY AND IMPLEMENTATION GAMIBHEAR is available as an R package under the open-source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julia Markowski
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| | - Rieke Kempfer
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| | - Alexander Kukalev
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
| | - Ibai Irastorza-Azcarate
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
| | - Gesa Loof
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| | - Birte Kehr
- Berlin Institute of Health (BIH) at Charité–Universitätsmedizin Berlin, 10177 Berlin, Germany
- Regensburg Center for Interventional Immunology (RCI), 93053 Regensburg, Germany
- Universität Regensburg, 93053 Regensburg, Germany
| | - Ana Pombo
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| | - Sven Rahmann
- Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, 45122 Essen, Germany
| | - Roland F Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), 10623 Berlin, Germany
| |
Collapse
|
4
|
Yan Z, Zhu X, Wang Y, Nie Y, Guan S, Kuo Y, Chang D, Li R, Qiao J, Yan L. scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data. BMC Bioinformatics 2020; 21:41. [PMID: 32007105 PMCID: PMC6995221 DOI: 10.1186/s12859-020-3381-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/22/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Haplotyping reveals chromosome blocks inherited from parents to in vitro fertilized (IVF) embryos in preimplantation genetic diagnosis (PGD), enabling the observation of the transmission of disease alleles between generations. However, the methods of haplotyping that are suitable for single cells are limited because a whole genome amplification (WGA) process is performed before sequencing or genotyping in PGD, and true haplotype profiles of embryos need to be constructed based on genotypes that can contain many WGA artifacts. RESULTS Here, we offer scHaplotyper as a genetic diagnosis tool that reconstructs and visualizes the haplotype profiles of single cells based on the Hidden Markov Model (HMM). scHaplotyper can trace the origin of each haplotype block in the embryo, enabling the detection of carrier status of disease alleles in each embryo. We applied this method in PGD in two families affected with genetic disorders, and the result was the healthy live births of two children in the two families, demonstrating the clinical application of this method. CONCLUSION Next generation sequencing (NGS) of preimplantation embryos enable genetic screening for families with genetic disorders, avoiding the birth of affected babies. With the validation and successful clinical application, we showed that scHaplotyper is a convenient and accurate method to screen out embryos. More patients with genetic disorder will benefit from the genetic diagnosis of embryos. The source code of scHaplotyper is available at GitHub repository: https://github.com/yzqheart/scHaplotyper.
Collapse
Affiliation(s)
- Zhiqiang Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xiaohui Zhu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yuqian Wang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yanli Nie
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Shuo Guan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Ying Kuo
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Di Chang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Rong Li
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Jie Qiao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China
| | - Liying Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China. .,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China. .,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.
| |
Collapse
|
5
|
Tangherloni A, Spolaor S, Rundo L, Nobile MS, Cazzaniga P, Mauri G, Liò P, Merelli I, Besozzi D. GenHap: a novel computational method based on genetic algorithms for haplotype assembly. BMC Bioinformatics 2019; 20:172. [PMID: 30999845 PMCID: PMC6471693 DOI: 10.1186/s12859-019-2691-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. Results To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. Conclusions Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap.
Collapse
Affiliation(s)
- Andrea Tangherloni
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.
| | - Simone Spolaor
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy
| | - Leonardo Rundo
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,Institute of Molecular Bioimaging and Physiology, Italian National Research Council, Contrada Pietrapollastra-Pisciotto, Cefalù (PA), 90015, Italy
| | - Marco S Nobile
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Piazzale Sant'Agostino 2, Bergamo, 24129, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
| | - Ivan Merelli
- Institute of Biomedical Technologies, Italian National Research Council, Via Fratelli Cervi 93, Segrate (MI), 20090, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy
| |
Collapse
|
6
|
Tilleman L, Weymaere J, Heindryckx B, Deforce D, Nieuwerburgh FV. Contemporary pharmacogenetic assays in view of the PharmGKB database. Pharmacogenomics 2019; 20:261-272. [PMID: 30883266 DOI: 10.2217/pgs-2018-0167] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
AIM Six modern PGx assays were compared with the Pharmacogenomics Knowledge Base (PharmGKB) to determine the proportion of the currently known PGx genotypes that are assessed by these assays. MATERIALS & METHODS Investigated assays were 'Ion AmpliSeq Pharmacogenomics', 'iPLEX PGx Pro', 'DMET Plus,' 'PharmcoScan,' 'Living DNA' and '23andMe.' RESULTS PharmGKB contains 3474 clinical annotations of which 75, 70 and 45% can be determined by PharmacoScan, Living DNA and 23andMe, respectively. The other assays are designed to test a specific subset of PGx variants. CONCLUSION Assaying all known PGx variants would only comprise a minor fraction of the current assays' capacity. Unfortunately, this is not achieved. Moreover, not necessarily the variants with the highest effects or the highest evidence are selected.
Collapse
Affiliation(s)
- Laurentijn Tilleman
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - Jana Weymaere
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - Björn Heindryckx
- Ghent-Fertility & Stem Cell Team (G-FaST), Department for Reproductive Medicine, Ghent University Hospital, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| |
Collapse
|