1
|
Rodilla C, Núñez-Moreno G, Benitez Y, Rodríguez de Alba M, Blanco-Kelly F, López-Alcojor A, Fernández-Caballero L, Perea-Romero I, Del Pozo-Valero M, García-García G, Balanzá M, Villaverde C, Zurita O, Jubin C, Fund C, Delepine M, Leduc A, Deleuze JF, Millán JM, Minguez P, Corton M, Ayuso C. Long-Read Whole-Genome Sequencing as a Tool for Variant Detection in Inherited Retinal Dystrophies. Int J Mol Sci 2025; 26:3825. [PMID: 40332496 PMCID: PMC12027592 DOI: 10.3390/ijms26083825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 04/07/2025] [Accepted: 04/15/2025] [Indexed: 05/08/2025] Open
Abstract
Advances in whole-genome sequencing (WGS) have significantly enhanced our ability to detect genomic variants underlying inherited diseases. In this study, we performed long-read WGS on 24 patients with inherited retinal dystrophies (IRDs) to validate the utility of nanopore sequencing in detecting genomic variations. We confirmed the presence of all previously detected variants and demonstrated that this approach allows for the precise refinement of structural variants (SVs). Furthermore, we could perform genotype phasing by sequencing only the probands, confirming that the variants were inherited in trans. Moreover, nanopore sequencing enables the detection of complex variants, such as transposon insertions and structural rearrangements. This comprehensive assessment illustrates the power of long-read sequencing in capturing diverse forms of genomic variation and in improving diagnostic accuracy in IRDs.
Collapse
Affiliation(s)
- Cristina Rodilla
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Gonzalo Núñez-Moreno
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Yolanda Benitez
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Marta Rodríguez de Alba
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Fiona Blanco-Kelly
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Aroa López-Alcojor
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
| | - Lidia Fernández-Caballero
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Irene Perea-Romero
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Marta Del Pozo-Valero
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Gema García-García
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
- Molecular, Cellular and Genomics Biomedicine, Health Research Institute La Fe, 46026 Valencia, Spain
| | - Mar Balanzá
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
- Molecular, Cellular and Genomics Biomedicine, Health Research Institute La Fe, 46026 Valencia, Spain
| | - Cristina Villaverde
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Olga Zurita
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Claire Jubin
- Centre National de Recherche en Génomique Humaine, Université Paris-Saclay, 91057 Evry, France; (C.J.); (C.F.); (M.D.); (A.L.); (J.-F.D.)
| | - Cedric Fund
- Centre National de Recherche en Génomique Humaine, Université Paris-Saclay, 91057 Evry, France; (C.J.); (C.F.); (M.D.); (A.L.); (J.-F.D.)
| | - Marc Delepine
- Centre National de Recherche en Génomique Humaine, Université Paris-Saclay, 91057 Evry, France; (C.J.); (C.F.); (M.D.); (A.L.); (J.-F.D.)
| | - Aurelie Leduc
- Centre National de Recherche en Génomique Humaine, Université Paris-Saclay, 91057 Evry, France; (C.J.); (C.F.); (M.D.); (A.L.); (J.-F.D.)
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, Université Paris-Saclay, 91057 Evry, France; (C.J.); (C.F.); (M.D.); (A.L.); (J.-F.D.)
| | - José M. Millán
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
- Molecular, Cellular and Genomics Biomedicine, Health Research Institute La Fe, 46026 Valencia, Spain
| | - Pablo Minguez
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Marta Corton
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| | - Carmen Ayuso
- Department of Genetics & Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain; (C.R.); (G.N.-M.); (Y.B.); (M.R.d.A.); (F.B.-K.); (A.L.-A.); (L.F.-C.); (I.P.-R.); (M.D.P.-V.); (C.V.); (O.Z.); (P.M.)
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain; (G.G.-G.); (M.B.); (J.M.M.)
| |
Collapse
|
2
|
Yang Q, Sun J, Wang X, Wang J, Liu Q, Ru J, Zhang X, Wang S, Hao R, Bian P, Dai X, Gong M, Zhang Z, Wang A, Bai F, Li R, Cai Y, Jiang Y. SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants. Nat Commun 2025; 16:2406. [PMID: 40069188 PMCID: PMC11897243 DOI: 10.1038/s41467-025-57756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 03/04/2025] [Indexed: 03/15/2025] Open
Abstract
Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.
Collapse
Affiliation(s)
- Qimeng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Xinyu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Jiong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Jinlong Ru
- Institute of Virology, Helmholtz Centre Munich - German Research Centre for Environmental Health, Neuherberg, Germany
| | - Xin Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Sizhe Wang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Ran Hao
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Peipei Bian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
- Yazhouwan National Laboratory, Sanya, Hainan, China
| | - Mian Gong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Zhuangbiao Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Ao Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Fengting Bai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Yudong Cai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China.
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China.
| |
Collapse
|
3
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
4
|
Deng CH, Naithani S, Kumari S, Cobo-Simón I, Quezada-Rodríguez EH, Skrabisova M, Gladman N, Correll MJ, Sikiru AB, Afuwape OO, Marrano A, Rebollo I, Zhang W, Jung S. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database (Oxford) 2023; 2023:baad088. [PMID: 38079567 PMCID: PMC10712715 DOI: 10.1093/database/baad088] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Collapse
Affiliation(s)
- Cecilia H Deng
- Molecular and Digital Breeding, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, 120 Mt Albert Road, Auckland 1025, New Zealand
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
| | - Irene Cobo-Simón
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
- Institute of Forest Science (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Maria Skrabisova
- Department of Biochemistry, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Nick Gladman
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853, USA
| | - Melanie J Correll
- Agricultural and Biological Engineering Department, University of Florida, 1741 Museum Rd, Gainesville, FL 32611, USA
| | | | | | - Annarita Marrano
- Phoenix Bioinformatics, 39899 Balentine Drive, Suite 200, Newark, CA 94560, USA
| | | | - Wentao Zhang
- National Research Council Canada, 110 Gymnasium Pl, Saskatoon, Saskatchewan S7N 0W9, Canada
| | - Sook Jung
- Department of Horticulture, Washington State University, 303c Plant Sciences Building, Pullman, WA 99164-6414, USA
| |
Collapse
|
5
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
6
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
7
|
Lu N, Qiao Y, Lu Z, Tu J. Chimera: The spoiler in multiple displacement amplification. Comput Struct Biotechnol J 2023; 21:1688-1696. [PMID: 36879882 PMCID: PMC9984789 DOI: 10.1016/j.csbj.2023.02.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/18/2023] [Accepted: 02/18/2023] [Indexed: 02/24/2023] Open
Abstract
Multiple displacement amplification (MDA) based on isothermal random priming and high fidelity phi29 DNA polymerase-mediated processive extension has revolutionized the field of whole genome amplification by enabling the amplification of minute amounts of DNA, such as from a single cell, generating vast amounts of DNA with high genome coverage. Despite its advantages, MDA has its own challenges, one of the grandest being the formation of chimeric sequences (chimeras), which presents in all MDA products and seriously disturbs the downstream analysis. In this review, we provide a comprehensive overview of current research on MDA chimeras. We first reviewed the mechanisms of chimera formation and chimera detection methods. We then systematically summarized the characteristics of chimeras, including overlap, chimeric distance, chimeric density, and chimeric rate, as found in independently published sequencing data. Finally, we reviewed the methods used to process chimeric sequences and their impacts on the improvement of data utilization efficiency. The information presented in this review will be useful for those interested in understanding the challenges with MDA and in improving its performance.
Collapse
Affiliation(s)
- Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Yi Qiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
8
|
Pokrovac I, Pezer Ž. Recent advances and current challenges in population genomics of structural variation in animals and plants. Front Genet 2022; 13:1060898. [PMID: 36523759 PMCID: PMC9745067 DOI: 10.3389/fgene.2022.1060898] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/15/2022] [Indexed: 05/02/2024] Open
Abstract
The field of population genomics has seen a surge of studies on genomic structural variation over the past two decades. These studies witnessed that structural variation is taxonomically ubiquitous and represent a dominant form of genetic variation within species. Recent advances in technology, especially the development of long-read sequencing platforms, have enabled the discovery of structural variants (SVs) in previously inaccessible genomic regions which unlocked additional structural variation for population studies and revealed that more SVs contribute to evolution than previously perceived. An increasing number of studies suggest that SVs of all types and sizes may have a large effect on phenotype and consequently major impact on rapid adaptation, population divergence, and speciation. However, the functional effect of the vast majority of SVs is unknown and the field generally lacks evidence on the phenotypic consequences of most SVs that are suggested to have adaptive potential. Non-human genomes are heavily under-represented in population-scale studies of SVs. We argue that more research on other species is needed to objectively estimate the contribution of SVs to evolution. We discuss technical challenges associated with SV detection and outline the most recent advances towards more representative reference genomes, which opens a new era in population-scale studies of structural variation.
Collapse
Affiliation(s)
| | - Željka Pezer
- Laboratory for Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| |
Collapse
|