1
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
2
|
Silva-Alarcon S, Valencia C, Newball L, Saldarriaga W, Castillo A. Molecular Variants in Genes related to the Response to Ocular Hypotensive Drugs in an Afro-Colombian Population. Open Ophthalmol J 2022. [DOI: 10.2174/18743641-v16-e2205250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Aims:
This study aimed to conduct an exploratory analysis of the pharmacogenomic variants involved in ocular hypotensive drugs to understand the individual differential response in an Afro-descendant population.
Background:
Glaucoma is the leading cause of irreversible blindness worldwide. The pharmacologic treatment available consists of lowering intraocular pressure by administering topical drugs. In Asian and Caucasian people, pharmacogenomic variants associated with the efficacy of these treatments have been identified. However, in Afro-descendant populations, there is a profound gap in this knowledge.
Objective:
This study identified the pharmacogenomic variants related to ocular hypotensive efficacy treatment in Afro-descendant individuals from the Archipelago of San Andres and Providence, Colombia.
Methods:
An analysis of whole-exome sequencings (WES), functional annotation, and clinical significance was performed for pharmacogenomic variants reported in PharmGKB databases; in turn, an in silico available prediction analysis was carried out for the novel variants.
Results:
We identified six out of 18 non-synonymous variants with a clinical annotation in PharmGKB. Five were classified as level three evidence for the hypotensive drugs; rs1801252 and rs1801253 in the ADRB1 gene and rs1042714 in the ADRB2 gene. These pharmacogenomic variants have been involved in a lack of efficacy of topical beta-blockers and higher systolic and diastolic pressure under treatment with ophthalmic timolol drug. The rs1045642 in the ABCB1 gene was associated with greater efficacy of treatments with latanoprost drug. Also, we found the haplotypes *17 for CYP2D6 and *10 for CYP2C19; both related to reducing the enzyme activity to timolol drug metabolization. In addition, we observed 50 novel potentially actionable variants; 36 synonymous, two insertion variants that caused frameshift mutations, and 12 non-synonymous, where five were predicted to be pathogenic based on several pathogenicity predictions.
Conclusion:
Our results suggested that the pharmacogenomic variants were found to decrease the ocular hypotensive efficacy treatment in a Colombian Afro-descendant population and revealed a significant proportion of novel variants with a potential to influence drug response.
Collapse
|
3
|
Kaminow B, Ballouz S, Gillis J, Dobin A. Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses. Genome Res 2022; 32:738-749. [PMID: 35256454 PMCID: PMC8997357 DOI: 10.1101/gr.275613.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 03/02/2022] [Indexed: 11/25/2022]
Abstract
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the pan-human, super-population, and population levels, utilizing variant information from the 1000 Genomes Project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ~2-3 when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase overusing the pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. Replacing reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
Collapse
Affiliation(s)
- Benjamin Kaminow
- Cold Spring Harbor Laboratory; Weill Cornell Graduate School of Medical Sciences
| | - Sara Ballouz
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research; School of Medical Sciences, University of New South Wales; Cold Spring Harbor Laboratory
| | | | | |
Collapse
|
4
|
Samaha G, Wade CM, Mazrier H, Grueber CE, Haase B. Exploiting genomic synteny in Felidae: cross-species genome alignments and SNV discovery can aid conservation management. BMC Genomics 2021; 22:601. [PMID: 34362297 PMCID: PMC8348863 DOI: 10.1186/s12864-021-07899-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 07/14/2021] [Indexed: 11/10/2022] Open
Abstract
Background While recent advances in genomics has enabled vast improvements in the quantification of genome-wide diversity and the identification of adaptive and deleterious alleles in model species, wildlife and non-model species have largely not reaped the same benefits. This has been attributed to the resources and infrastructure required to develop essential genomic datasets such as reference genomes. In the absence of a high-quality reference genome, cross-species alignments can provide reliable, cost-effective methods for single nucleotide variant (SNV) discovery. Here, we demonstrated the utility of cross-species genome alignment methods in gaining insights into population structure and functional genomic features in cheetah (Acinonyx jubatas), snow leopard (Panthera uncia) and Sumatran tiger (Panthera tigris sumatrae), relative to the domestic cat (Felis catus). Results Alignment of big cats to the domestic cat reference assembly yielded nearly complete sequence coverage of the reference genome. From this, 38,839,061 variants in cheetah, 15,504,143 in snow leopard and 13,414,953 in Sumatran tiger were discovered and annotated. This method was able to delineate population structure but limited in its ability to adequately detect rare variants. Enrichment analysis of fixed and species-specific SNVs revealed insights into adaptive traits, evolutionary history and the pathogenesis of heritable diseases. Conclusions The high degree of synteny among felid genomes enabled the successful application of the domestic cat reference in high-quality SNV detection. The datasets presented here provide a useful resource for future studies into population dynamics, evolutionary history and genetic and disease management of big cats. This cross-species method of variant discovery provides genomic context for identifying annotated gene regions essential to understanding adaptive and deleterious variants that can improve conservation outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07899-2.
Collapse
Affiliation(s)
- Georgina Samaha
- Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, Sydney, NSW, Australia.
| | - Claire M Wade
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Hamutal Mazrier
- Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, Sydney, NSW, Australia
| | - Catherine E Grueber
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Bianca Haase
- Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
5
|
Hamdi Y, Zass L, Othman H, Radouani F, Allali I, Hanachi M, Okeke CJ, Chaouch M, Tendwa MB, Samtal C, Mohamed Sallam R, Alsayed N, Turkson M, Ahmed S, Benkahla A, Romdhane L, Souiai O, Tastan Bishop Ö, Ghedira K, Mohamed Fadlelmola F, Mulder N, Kamal Kassim S. Human OMICs and Computational Biology Research in Africa: Current Challenges and Prospects. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:213-233. [PMID: 33794662 DOI: 10.1089/omi.2021.0004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the publication of the first human genome, OMICs research, including genomics, transcriptomics, proteomics, and metagenomics, has been on the rise. OMICs studies revealed the complex genetic diversity among human populations and challenged our understandings of genotype-phenotype correlations. Africa, being the cradle of the first modern humans, is distinguished by a large genetic diversity within its populations and rich ethnolinguistic history. However, the available human OMICs tools and databases are not representative of this diversity, therefore creating significant gaps in biomedical research. African scientists, students, and publics are among the key contributors to OMICs systems science. This expert review examines the pressing issues in human OMICs research, education, and development in Africa, as seen through a lens of computational biology, public health relevant technology innovation, critically-informed science governance, and how best to harness OMICs data to benefit health and societies in Africa and beyond. We underscore the disparities between North and Sub-Saharan Africa at different levels. A harmonized African ethnolinguistic classification would help address annotation challenges associated with population diversity. Finally, building on the existing strategic research initiatives, such as the H3Africa and H3ABioNet Consortia, we highly recommend addressing large-scale multidisciplinary research challenges, strengthening research collaborations and knowledge transfer, and enhancing the ability of African researchers to influence and shape national and international research, policy, and funding agendas. This article and analysis contribute to a deeper understanding of past and current challenges in the African OMICs innovation ecosystem, while also offering foresight on future innovation trajectories.
Collapse
Affiliation(s)
- Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Houcemeddine Othman
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Fouzia Radouani
- Chlamydiae and Mycoplasmas Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, and Genomic Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Chiamaka Jessica Okeke
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Maureen Bilinga Tendwa
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-food and Health, Faculty of Sciences Dhar El Mahraz-Sidi Mohammed Ben Abdellah University, Fez, Morocco.,University of Mohamed Premier, Oujda, Morocco
| | - Reem Mohamed Sallam
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt.,Department of Basic Medical Sciences, Faculty of Medicine, Galala University, Suez, Egypt
| | - Nihad Alsayed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Michael Turkson
- The National Institute for Mathematical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Samah Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Lilia Romdhane
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Faisal Mohamed Fadlelmola
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Samar Kamal Kassim
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
6
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021. [PMID: 33397413 DOI: 10.1101/2020.03.03.975219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
7
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021; 22:8. [PMID: 33397413 PMCID: PMC7780692 DOI: 10.1186/s13059-020-02229-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 12/08/2020] [Indexed: 12/30/2022] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|