1
|
Cui X, Liu Y, Sun M, Zhao Q, Huang Y, Zhang J, Yao Q, Yin H, Zhang H, Mo F, Zhong H, Liu Y, Chen X, Zhang Y, Liu J, Qiu Y, Feng M, Chen X, Ghanizadeh H, Zhou Y, Wang A. The nature of complex structural variations in tomatoes. HORTICULTURE RESEARCH 2025; 12:uhaf107. [PMID: 40406505 PMCID: PMC12096311 DOI: 10.1093/hr/uhaf107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 04/06/2025] [Indexed: 05/26/2025]
Abstract
Structural variations (SVs) in repetitive sequences could only be detected within a broad region due to imprecise breakpoints, leading to classification errors and inaccurate trait analysis. Through manual inspection at 4532 variant regions identified by integrating 14 detection pipelines between two tomato genomes, we generated an SV benchmark at base-pair resolution. Evaluation of all pipelines yielded F1-scores below 53.77% with this benchmark, underscoring the urgent need for advanced detection algorithms in plant genomics. Analyzing the alignment features of the repetitive sequences in each region, we summarized four patterns of SV breakpoints and revealed that deviations in breakpoint identification were primarily due to copy misalignment. According to the similarities among copies, we identified 1635 bona fide SVs with precise breakpoints, including substitutions (223), which should be taken as a fundamental SV type, alongside insertions (780), deletions (619), and inversions (13), all showing preferences for SV occurrence within AT-repeat regions of regulatory loci. This precise resolution of complex SVs will foster genome analysis and crop improvement.
Collapse
Affiliation(s)
- Xue Cui
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Yuxin Liu
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Miao Sun
- State Key Laboratory of Forage Breeding-by-Design and Utilization, Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Qiyue Zhao
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Yicheng Huang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen Key Laboratory of Agricultural Synthetic Biology, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Jianwei Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiulin Yao
- Wuhan Jianbing Technology Co., Ltd., Wuhan, China
| | - Hang Yin
- State Key Laboratory of Forage Breeding-by-Design and Utilization, Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Huixin Zhang
- College of Life Sciences, Northeast Agricultural University, Harbin 150030, China
| | - Fulei Mo
- College of Life Sciences, Northeast Agricultural University, Harbin 150030, China
| | - Hongbin Zhong
- Shenzhen CEM Biomedical Technology Ltd., Shenzhen, China
| | - Yang Liu
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Xiuling Chen
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Yao Zhang
- College of Life Sciences, Northeast Agricultural University, Harbin 150030, China
| | - Jiayin Liu
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Youwen Qiu
- College of Life Sciences, Northeast Agricultural University, Harbin 150030, China
| | - Mingfang Feng
- College of Life Sciences, Northeast Agricultural University, Harbin 150030, China
| | - Xu Chen
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Hossein Ghanizadeh
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Yao Zhou
- State Key Laboratory of Forage Breeding-by-Design and Utilization, Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Academician Workstation of Agricultural High-tech Industrial Area of the Yellow River Delta, National Center of Technology Innovation for Comprehensive Utilization of Saline-Alkali Land, Dongying 257300, China
| | - Aoxue Wang
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|
2
|
Guo F, Li Y, Zhao H, Liu X, Mao J, Ma D, Liu S. GKNnet: an relational graph convolutional network-based method with knowledge-augmented activation layer for microbial structural variation detection. Brief Bioinform 2025; 26:bbaf200. [PMID: 40324334 PMCID: PMC12052243 DOI: 10.1093/bib/bbaf200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 03/09/2025] [Accepted: 04/10/2025] [Indexed: 05/07/2025] Open
Abstract
Structural variants (SVs) in microbial genomes play a critical role in phenotypic changes, environmental adaptation, and species evolution, with deletion variations particularly closely linked to phenotypic traits. Therefore, accurate and comprehensive identification of deletion variations is essential. Although long-read sequencing technology can detect more SVs, its high error rate introduces substantial noise, leading to high false-positive and low recall rates in existing SV detection algorithms. This paper presents an SV detection method based on graph convolutional networks (GCNs). The model first represents node features through a heterogeneous graph, leveraging the GCN to precisely identify variant regions. Additionally, a knowledge-augmented activation layer (KANLayer) with a learnable activation function is introduced to reduce noise around variant regions, thereby improving model precision and reducing false positives. A clustering algorithm then aggregates multiple overlapping regions near the variant center into a single accurate SV interval, further enhancing recall. Validation on both simulated and real datasets demonstrates that our method achieves superior F1 scores compared to benchmark methods (cuteSV, Sniffles, Svim, and Pbsv), highlighting its advantage and robustness in SV detection and offering an innovative solution for microbial genome structural variation research.
Collapse
Affiliation(s)
- Fengyi Guo
- School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
| | - Yuanbo Li
- School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
| | - Hongyuan Zhao
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
| | - Xiaogang Liu
- Luzhou Laojiao Group Co. Ltd, 157 Guojiao Road, Jiangyang District, Luzhou 646000, Sichuan, China
| | - Jian Mao
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
- Shaoxing Key Laboratory of Traditional Fermentation Food and Human Health, Jiangnan University (Shaoxing) Industrial Technology Research Institute, Keqiao District, Shaoxing 312000, Zhejiang, China
| | - Dongna Ma
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
| | - Shuangping Liu
- School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, Jiangnan University, 1800 Lihu Avenue, Binhu District, Wuxi, Jiangsu 214122, China
- Luzhou Laojiao Group Co. Ltd, 157 Guojiao Road, Jiangyang District, Luzhou 646000, Sichuan, China
| |
Collapse
|
3
|
Gao R, Hu H, Jiang Z, Cao S, Wang G, Zhao Y, Jiang T. SVHunter: long-read-based structural variation detection through the transformer model. Brief Bioinform 2025; 26:bbaf203. [PMID: 40341921 PMCID: PMC12062572 DOI: 10.1093/bib/bbaf203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 03/31/2025] [Accepted: 04/15/2025] [Indexed: 05/11/2025] Open
Abstract
Structural variations (SVs) are genomic rearrangements larger than 50 bp, that are widely present in the human genome and are associated with various complex diseases. Existing long-read-based SV detection tools often rely on fixed rules or heuristic algorithms, which can oversimplify the complexity of SV signatures. Therefore, these methods usually lack flexibility and cannot fully capture SV signals, leading to reduced accuracy and robustness. To address these issues, we propose SVHunter, a transformer-based method for long-read SV detection. SVHunter combines convolutional neural networks and transformers to capture both local and global SV signatures, enabling accurate identification of SVs. Additionally, SVHunter employs the mean shift clustering algorithm, which dynamically adjusts bandwidth parameters to accommodate different types of SVs without requiring a preset number of clusters, thus allowing precise breakpoint clustering. Validation across multiple sequencing platforms and datasets demonstrates that SVHunter excels at detecting various types of SVs, with a notable reduction in the false discovery rate. This highlights considerable strong potential for both research and clinical applications.
Collapse
Affiliation(s)
- Runtian Gao
- College of Life Science, Northeast Forestry University, Harbin 150000, China
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
| | - Heng Hu
- College of Life Science, Northeast Forestry University, Harbin 150000, China
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Science, Northeast Forestry University, Harbin 150000, China
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
| | - Shuqi Cao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yuming Zhao
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
4
|
Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, Rosen BD, Ma L, Li C, Baldwin RL, Van Tassell CP, Zhang Z, Smith TPL, Liu GE. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res 2025:S2090-1232(25)00258-9. [PMID: 40258473 DOI: 10.1016/j.jare.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 04/06/2025] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
INTRODUCTION Most SV studies in livestock rely on short-read sequencing, posing challenges in accurately characterizing large genomic variants due to their limited read length. OBJECTIVES Our goal is to reveal structural variation and novel sequences specific to Holstein and Jersey cattle breeds using long-read and pan-genome analyses. METHODS We sequenced 20 Holsteins and 8 Jersey cattle using PacBio HiFi to 20×, and integrated five read-based and one assembly-based SV caller to determine SVs. RESULTS We assembled the 28 genomes averaging 3.25 Gb with a contig N50 of 69.36 Mb and using the ARS-UCD1.2 reference, we acquired Holstein/Jersey SV catalogs with 74,068/54,689 events spanning 202/135 Mb (7.43 %/4.97 % of the genome). SVs were enriched in less conserved, non-coding, and non-regulatory regions. Comparing Holsteins with differing feed efficiency (FE), SVs unique to high FE were linked to energy metabolism and olfactory receptors, while those specific to low FE were associated with material transport. We constructed Holstein/Jersey pangenome graphs with 148,598/105,875 nodes and 208,891/147,990 edges, representing 47,028/37,137 biallelic and multi-allelic events, and 63.75/42.34 Mb of novel sequence. We observed SV count saturation with 20 Holsteins, while adding Jerseys significantly increased the SV count, highlighting breed-specific SV events. CONCLUSION Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought.
Collapse
Affiliation(s)
- Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China; Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Kristen Kuhn
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - Wenli Li
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Geoffrey Zanton
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Mary Bowman
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
| | - Lingzhao Fang
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| | - John B Cole
- Council on Dairy Cattle Breeding, 4201 Northview Dr, Bowie, MD 20716, USA; Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program, and the Genetics Institute, University of Florida, Gainesville, FL 32611-0910, USA; Department of Animal Science, North Carolina State University, Raleigh, NC 27695-7621, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Congjun Li
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| |
Collapse
|
5
|
Mahmoud M, Agustinho DP, Sedlazeck FJ. A Hitchhiker's Guide to long-read genomic analysis. Genome Res 2025; 35:545-558. [PMID: 40228901 PMCID: PMC12047252 DOI: 10.1101/gr.279975.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering the hidden and complex regions of the genome. Significant cost efficiency, scalability, and accuracy advancements have driven this evolution. Concurrently, novel analytical methods have emerged to harness the full potential of long reads. These advancements have enabled milestones such as the first fully completed human genome, enhanced identification and understanding of complex genomic variants, and deeper insights into the interplay between epigenetics and genomic variation. This mini-review provides a comprehensive overview of the latest developments in long-read DNA sequencing analysis, encompassing reference-based and de novo assembly approaches. We explore the entire workflow, from initial data processing to variant calling and annotation, focusing on how these methods improve our ability to interpret a wide array of genomic variants. Additionally, we discuss the current challenges, limitations, and future directions in the field, offering a detailed examination of the state-of-the-art bioinformatics methods for long-read sequencing.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel P Agustinho
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
6
|
Qiu T, Li J, Guo Y, Jiang L, Tang J. SVEA: an accurate model for structural variation detection using multi-channel image encoding and enhanced AlexNet architecture. J Transl Med 2025; 23:221. [PMID: 39987107 PMCID: PMC11846410 DOI: 10.1186/s12967-025-06213-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 02/06/2025] [Indexed: 02/24/2025] Open
Abstract
BACKGROUND Structural variations (SVs) are a pervasive and impactful class of genetic variation within the genome, significantly influencing gene function, impacting human health, and contributing to disease. Recent advances in deep learning have shown promise for SV detection; however, current methods still encounter key challenges in effective feature extraction and accurately predicting complex variations. METHODS We introduce SVEA, an advanced deep learning model designed to address these challenges. SVEA employs a novel multi-channel image encoding approach that transforms SVs into multi-dimensional image formats, improving the model's ability to capture subtle genomic variations. Additionally, SVEA integrates multi-head self-attention mechanisms and multi-scale convolution modules, enhancing its ability to capture global context and multi-scale features. The model was trained and tested on a diverse range of genomic datasets to evaluate its accuracy and generalizability. RESULTS SVEA demonstrated superior performance in detecting complex SVs compared to existing methods, with improved accuracy across various genomic regions. The multi-channel encoding and advanced feature extraction techniques contributed to the model's enhanced ability to predict subtle and complex variations. CONCLUSIONS This study presents SVEA, a deep learning model incorporating advanced encoding and feature extraction techniques to enhance structural variation prediction. The model demonstrates high accuracy, outperforming existing methods by approximately 4%, while also identifying areas for further optimization.
Collapse
Affiliation(s)
- Taixing Qiu
- College of Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, China
| | - Jiawei Li
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, China
| | - Yan Guo
- Department of Public Health Sciences, University of Miami, Miami, FL, 33136, USA
| | - Limin Jiang
- Department of Public Health Sciences, University of Miami, Miami, FL, 33136, USA.
| | - Jijun Tang
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, China.
| |
Collapse
|
7
|
Hu M, Wan P, Chen C, Tang S, Chen J, Wang L, Chakraborty M, Zhou Y, Chen J, Gaut BS, Emerson J, Liao Y. Benchmarking, detection, and genotyping of structural variants in a population of whole-genome assemblies using the SVGAP pipeline. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.07.637096. [PMID: 39975360 PMCID: PMC11839052 DOI: 10.1101/2025.02.07.637096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Comparisons of complete genome assemblies offer a direct procedure for characterizing all genetic differences among them. However, existing tools are often limited to specific aligners or optimized for specific organisms, narrowing their applicability, particularly for large and repetitive plant genomes. Here, we introduce SVGAP, a pipeline for structural variant (SV) discovery, genotyping, and annotation from high-quality genome assemblies at the population level. Through extensive benchmarks using simulated SV datasets at individual, population, and phylogenetic contexts, we demonstrate that SVGAP performs favorably relative to existing tools in SV discovery. Additionally, SVGAP is one of the few tools to address the challenge of genotyping SVs within large assembled genome samples, and it generates fully genotyped VCF files. Applying SVGAP to 26 maize genomes revealed hidden genomic diversity in centromeres, driven by abundant insertions of centromere-specific LTR-retrotransposons. The output of SVGAP is well-suited for pan-genome construction and facilitates the interpretation of previously unexplored genomic regions.
Collapse
Affiliation(s)
- Ming Hu
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
- These authors contributed equally to this work
| | - Penglong Wan
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
- These authors contributed equally to this work
| | - Chengjie Chen
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences & National Key Laboratory for Tropical Crop Breeding & Laboratory of Crop Gene Resources and Germplasm Enhancement in South China, Ministry of Agriculture and Rural Affairs & Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation of Hainan Province, Hainan, 571101, China
- These authors contributed equally to this work
| | - Shuyuan Tang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
| | - Jiahao Chen
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
| | - Liang Wang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
| | - Mahul Chakraborty
- Department of Biology, Texas A&M University, College Station, TX, 77843, USA
| | - Yongfeng Zhou
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences & National Key Laboratory for Tropical Crop Breeding & Laboratory of Crop Gene Resources and Germplasm Enhancement in South China, Ministry of Agriculture and Rural Affairs & Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation of Hainan Province, Hainan, 571101, China
| | - Jinfeng Chen
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Brandon S. Gaut
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697, USA
| | - J.J. Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697, USA
| | - Yi Liao
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China
| |
Collapse
|
8
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2025; 43:181-185. [PMID: 38519720 PMCID: PMC11825360 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
9
|
Chen X, Wei S, Sun C, Yi Z, Wang Z, Wu Y, Xu J, Tao J, Chen H, Zhang M, Jiang Y, Lv H, Huang C. Computational Tools for Studying Genome Structural Variation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2025; 29:36-48. [PMID: 39905890 DOI: 10.1089/omi.2024.0200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
Structural variation (SV) typically refers to alterations in DNA fragments at least 50 base pairs long in the human genome. It can alter thousands of DNA nucleotides and thus significantly influence human health, disease, and clinical phenotypes. There is a shared and growing recognition that the emergence of effective computational tools and high-throughput technologies such as short-read sequencing and long-read sequencing offers novel insight into SV and, by extension, diseases affecting planetary health. However, numerous available SV tools exist with varying strengths and weaknesses. This is currently hampering the abilities of scholars to select the optimal tools to study SVs. Here, we reviewed 175 tools developed in the past two decades for SV detection, annotation, visualization, and downstream analysis of human genomics. In this expert review, we provide a comprehensive catalog of SV-related tools across different technology platforms and summarize their features, strengths, and limitations with an eye to accelerate systems science and planetary health innovations.
Collapse
Affiliation(s)
- Xingyu Chen
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zelin Yi
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Zihan Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Yingyi Wu
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Huang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
10
|
Hu H, Zhao J, Thomas WJW, Batley J, Edwards D. The role of pangenomics in orphan crop improvement. Nat Commun 2025; 16:118. [PMID: 39746989 PMCID: PMC11696220 DOI: 10.1038/s41467-024-55260-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 12/05/2024] [Indexed: 01/04/2025] Open
Abstract
Global food security depends heavily on a few staple crops, while orphan crops, despite being less studied, offer the potential benefits of environmental adaptation and enhanced nutritional traits, especially in a changing climate. Major crops have benefited from genomics-based breeding, initially using single genomes and later pangenomes. Recent advances in DNA sequencing have enabled pangenome construction for several orphan crops, offering a more comprehensive understanding of genetic diversity. Orphan crop research has now entered the pangenomics era and applying these pangenomes with advanced selection methods and genome editing technologies can transform these neglected species into crops of broader agricultural significance.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of Rice Science and Technology, Guangzhou, China
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of Rice Science and Technology, Guangzhou, China
| | - William J W Thomas
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
11
|
Zhou B, Arthur JG, Guo H, Kim T, Huang Y, Pattni R, Wang T, Kundu S, Luo JXJ, Lee H, Nachun DC, Purmann C, Monte EM, Weimer AK, Qu PP, Shi M, Jiang L, Yang X, Fullard JF, Bendl J, Girdhar K, Kim M, Chen X, Greenleaf WJ, Duncan L, Ji HP, Zhu X, Song G, Montgomery SB, Palejev D, Zu Dohna H, Roussos P, Kundaje A, Hallmayer JF, Snyder MP, Wong WH, Urban AE. Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders. Cell 2024; 187:6687-6706.e25. [PMID: 39353437 DOI: 10.1016/j.cell.2024.09.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 07/01/2024] [Accepted: 09/10/2024] [Indexed: 10/04/2024]
Abstract
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | - Joseph G Arthur
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Hanmin Guo
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Taeyoung Kim
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea
| | - Yiling Huang
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tao Wang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Soumya Kundu
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Jay X J Luo
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Daniel C Nachun
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Carolin Purmann
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Emma M Monte
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Ping-Ping Qu
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Minyi Shi
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Lixia Jiang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Xinqiong Yang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kiran Girdhar
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Minsu Kim
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea
| | - Xi Chen
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | | | - Laramie Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Xiang Zhu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA; Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea; Center for Artificial Intelligence Research, Pusan National University, Busan 46241, South Korea
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Dean Palejev
- Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia 1113, Bulgaria
| | - Heinrich Zu Dohna
- Department of Biology, American University of Beirut, Beirut 11-0236, Lebanon
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA; Mental Illness Research Education and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Joachim F Hallmayer
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
12
|
Chen Y, Khan MZ, Wang X, Liang H, Ren W, Kou X, Liu X, Chen W, Peng Y, Wang C. Structural variations in livestock genomes and their associations with phenotypic traits: a review. Front Vet Sci 2024; 11:1416220. [PMID: 39600883 PMCID: PMC11588642 DOI: 10.3389/fvets.2024.1416220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 10/29/2024] [Indexed: 11/29/2024] Open
Abstract
Genomic structural variation (SV) refers to differences in gene sequences between individuals on a genomic scale. It is widely distributed in the genome, primarily in the form of insertions, deletions, duplications, inversions, and translocations. Due to its characterization by long segments and large coverage, SVs significantly impact the genetic characteristics and production performance of livestock, playing a crucial role in studying breed diversity, biological evolution, and disease correlation. Research on SVs contributes to an enhanced understanding of chromosome function and genetic characteristics and is important for understanding hereditary diseases mechanisms. In this article, we review the concept, classification, main formation mechanisms, detection methods, and advancement of research on SVs in the genomes of cattle, buffalo, equine, sheep, and goats, aiming to reveal the genetic basis of differences in phenotypic traits and adaptive genetic mechanisms through genomic research, which will provide a theoretical basis for better understanding and utilizing the genetic resources of herbivorous livestock.
Collapse
Affiliation(s)
| | - Muhammad Zahoor Khan
- College of Agronomy and Agricultural Engineering Liaocheng University, Liaocheng, China
| | | | | | | | | | | | | | - Yongdong Peng
- College of Agronomy and Agricultural Engineering Liaocheng University, Liaocheng, China
| | - Changfa Wang
- College of Agronomy and Agricultural Engineering Liaocheng University, Liaocheng, China
| |
Collapse
|
13
|
Wang S, Ye K. Deep-learning based representation and recognition for genome variants-from SNVs to structural variants. Natl Sci Rev 2024; 11:nwae335. [PMID: 39606147 PMCID: PMC11601977 DOI: 10.1093/nsr/nwae335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 09/13/2024] [Accepted: 09/17/2024] [Indexed: 11/29/2024] Open
Affiliation(s)
- Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, China
- School of Life Science and Technology, Xi'an Jiaotong University, China
- Faculty of Science, Leiden University, The Netherlands
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, China
| |
Collapse
|
14
|
Dodge TO, Kim BY, Baczenas JJ, Banerjee SM, Gunn TR, Donny AE, Given LA, Rice AR, Haase Cox SK, Weinstein ML, Cross R, Moran BM, Haber K, Haghani NB, Machin Kairuz JA, Gellert HR, Du K, Aguillon SM, Tudor MS, Gutiérrez-Rodríguez C, Rios-Cardenas O, Morris MR, Schartl M, Powell DL, Schumer M. Structural genomic variation and behavioral interactions underpin a balanced sexual mimicry polymorphism. Curr Biol 2024; 34:4662-4676.e9. [PMID: 39326413 DOI: 10.1016/j.cub.2024.08.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/15/2024] [Accepted: 08/29/2024] [Indexed: 09/28/2024]
Abstract
How phenotypic diversity originates and persists within populations are classic puzzles in evolutionary biology. While balanced polymorphisms segregate within many species, it remains rare for both the genetic basis and the selective forces to be known, leading to an incomplete understanding of many classes of traits under balancing selection. Here, we uncover the genetic architecture of a balanced sexual mimicry polymorphism and identify behavioral mechanisms that may be involved in its maintenance in the swordtail fish Xiphophorus birchmanni. We find that ∼40% of X. birchmanni males develop a "false gravid spot," a melanic pigmentation pattern that mimics the "pregnancy spot" associated with sexual maturity in female live-bearing fish. Using genome-wide association mapping, we detect a single intergenic region associated with variation in the false gravid spot phenotype, which is upstream of kitlga, a melanophore patterning gene. By performing long-read sequencing within and across populations, we identify complex structural rearrangements between alternate alleles at this locus. The false gravid spot haplotype drives increased allele-specific expression of kitlga, which provides a mechanistic explanation for the increased melanophore abundance that causes the spot. By studying social interactions in the laboratory and in nature, we find that males with the false gravid spot experience less aggression; however, they also receive increased attention from other males and are disdained by females. These behavioral interactions may contribute to the maintenance of this phenotypic polymorphism in natural populations. We speculate that structural variants affecting gene regulation may be an underappreciated driver of balanced polymorphisms across diverse species.
Collapse
Affiliation(s)
- Tristram O Dodge
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México.
| | - Bernard Y Kim
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - John J Baczenas
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Shreya M Banerjee
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, 475 Storer Mall, Davis, CA 95616, USA
| | - Theresa R Gunn
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Alex E Donny
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Lyle A Given
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Andreas R Rice
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Sophia K Haase Cox
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - M Luke Weinstein
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Ryan Cross
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Benjamin M Moran
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Kate Haber
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Berkeley High School, 1980 Allston Way, Berkeley, CA 94704, USA
| | - Nadia B Haghani
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | | | - Hannah R Gellert
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Kang Du
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA
| | - Stepfanie M Aguillon
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 612 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - M Scarlett Tudor
- Cooperative Extension and Aquaculture Research Institute, University of Maine, 33 Salmon Farm Road, Franklin, ME 04634, USA
| | - Carla Gutiérrez-Rodríguez
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Oscar Rios-Cardenas
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Molly R Morris
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Manfred Schartl
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA; Developmental Biochemistry, Biocenter, University of Würzburg, Am Hubland, 97074 Wuerzburg, Germany
| | - Daniel L Powell
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Biology, Louisiana State University, 202 Life Science Building, Baton Rouge, LA 70803, USA
| | - Molly Schumer
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Howard Hughes Medical Institute, 327 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
15
|
Zheng Y, Shang X. FindCSV: a long-read based method for detecting complex structural variations. BMC Bioinformatics 2024; 25:315. [PMID: 39342151 PMCID: PMC11439270 DOI: 10.1186/s12859-024-05937-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
16
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
17
|
Xia Z, Xiang W, Wang Q, Li X, Li Y, Gao J, Tang T, Yang C, Cui Y. CSV-Filter: a deep learning-based comprehensive structural variant filtering method for both short and long reads. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae539. [PMID: 39240375 PMCID: PMC11419953 DOI: 10.1093/bioinformatics/btae539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 07/29/2024] [Accepted: 09/03/2024] [Indexed: 09/07/2024]
Abstract
MOTIVATION Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed. RESULTS We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature. AVAILABILITY AND IMPLEMENTATION https://github.com/xzyschumacher/CSV-Filter.
Collapse
Affiliation(s)
- Zeyu Xia
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Weiming Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Hunan 410082, P. R. China
| | - Qingzhe Wang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Xingze Li
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Yilin Li
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Junyu Gao
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Tao Tang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Canqun Yang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
- National Supercomputer Center in Tianjin, Tianjin, 300457, P. R. China
- Haihe Lab of ITAI, Tianjin, 300457, P. R. China
| | - Yingbo Cui
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| |
Collapse
|
18
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
19
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
20
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
21
|
Yang X, Zheng G, Jia P, Wang S, Ye K. Pindel-TD: A Tandem Duplication Detector Based on A Pattern Growth Approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae008. [PMID: 38862430 PMCID: PMC11425056 DOI: 10.1093/gpbjnl/qzae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/24/2023] [Accepted: 11/03/2023] [Indexed: 06/13/2024]
Abstract
Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.
Collapse
Affiliation(s)
- Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Gaoyang Zheng
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Kai Ye
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 EZ, Netherland
| |
Collapse
|
22
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
23
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
24
|
Zheng Z, Zhu M, Zhang J, Liu X, Hou L, Liu W, Yuan S, Luo C, Yao X, Liu J, Yang Y. A sequence-aware merger of genomic structural variations at population scale. Nat Commun 2024; 15:960. [PMID: 38307885 PMCID: PMC10837428 DOI: 10.1038/s41467-024-45244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open
Abstract
Merging structural variations (SVs) at the population level presents a significant challenge, yet it is essential for conducting comprehensive genotypic analyses, especially in the era of pangenomics. Here, we introduce PanPop, a tool that utilizes an advanced sequence-aware SV merging algorithm to efficiently merge SVs of various types. We demonstrate that PanPop can merge and optimize the majority of multiallelic SVs into informative biallelic variants. We show its superior precision and lower rates of missing data compared to alternative software solutions. Our approach not only enables the filtering of SVs by leveraging multiple SV callers for enhanced accuracy but also facilitates the accurate merging of large-scale population SVs. These capabilities of PanPop will help to accelerate future SV-related studies.
Collapse
Affiliation(s)
- Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Mingjia Zhu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jin Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinfeng Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Liqiang Hou
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Shuai Yuan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Changhong Luo
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinhao Yao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| |
Collapse
|
25
|
Zheng Y, Shang X. SVvalidation: A long-read-based validation method for genomic structural variation. PLoS One 2024; 19:e0291741. [PMID: 38181020 PMCID: PMC10769053 DOI: 10.1371/journal.pone.0291741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/05/2023] [Indexed: 01/07/2024] Open
Abstract
Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
26
|
Jia P, Dong L, Yang X, Wang B, Bush SJ, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, Dang N, Ren L, Zhang Y, Wang X, Liang F, Wang Y, Ruan J, Xia H, Zheng Y, Shi L, Lv Y, Wang J, Ye K. Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biol 2023; 24:277. [PMID: 38049885 PMCID: PMC10694985 DOI: 10.1186/s13059-023-03116-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
Collapse
Affiliation(s)
- Peng Jia
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Lianhua Dong
- National Institute of Metrology, Beijing, 100029, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tingjie Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xixi Zhao
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yizhuo Che
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Ningxin Dang
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yujing Zhang
- National Institute of Metrology, Beijing, 100029, China
| | - Xia Wang
- National Institute of Metrology, Beijing, 100029, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Yang Wang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
| | - Jing Wang
- National Institute of Metrology, Beijing, 100029, China.
| | - Kai Ye
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
- Faculty of Science, Leiden University, Leiden, 2311EZ, The Netherlands.
| |
Collapse
|
27
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
28
|
Wei ZG, Bu PY, Zhang XD, Liu F, Qian Y, Wu FX. invMap: a sensitive mapping tool for long noisy reads with inversion structural variants. Bioinformatics 2023; 39:btad726. [PMID: 38058196 PMCID: PMC11320709 DOI: 10.1093/bioinformatics/btad726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 12/08/2023] Open
Abstract
MOTIVATION Longer reads produced by PacBio or Oxford Nanopore sequencers could more frequently span the breakpoints of structural variations (SVs) than shorter reads. Therefore, existing long-read mapping methods often generate wrong alignments and variant calls. Compared to deletions and insertions, inversion events are more difficult to be detected since the anchors in inversion regions are nonlinear to those in SV-free regions. To address this issue, this study presents a novel long-read mapping algorithm (named as invMap). RESULTS For each long noisy read, invMap first locates the aligned region with a specifically designed scoring method for chaining, then checks the remaining anchors in the aligned region to discover potential inversions. We benchmark invMap on simulated datasets across different genomes and sequencing coverages, experimental results demonstrate that invMap is more accurate to locate aligned regions and call SVs for inversions than the competing methods. The real human genome sequencing dataset of NA12878 illustrates that invMap can effectively find more candidate variant calls for inversions than the competing methods. AVAILABILITY AND IMPLEMENTATION The invMap software is available at https://github.com/zhang134/invMap.git.
Collapse
Affiliation(s)
- Ze-Gang Wei
- School of Physics and Optoelectronics Technology, Baoji University of Arts
and Sciences, Baoji 721016, China
- Division of Biomedical Engineering, Department of Computer Science and
Department of Mechanical Engineering, University of Saskatchewan,
Saskatoon, SK S7N 5A9, Canada
| | - Peng-Yu Bu
- School of Physics and Optoelectronics Technology, Baoji University of Arts
and Sciences, Baoji 721016, China
| | - Xiao-Dan Zhang
- School of Physics and Optoelectronics Technology, Baoji University of Arts
and Sciences, Baoji 721016, China
| | - Fei Liu
- School of Physics and Optoelectronics Technology, Baoji University of Arts
and Sciences, Baoji 721016, China
| | - Yu Qian
- School of Physics and Optoelectronics Technology, Baoji University of Arts
and Sciences, Baoji 721016, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Computer Science and
Department of Mechanical Engineering, University of Saskatchewan,
Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
29
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
30
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
31
|
Yu J, Chen N, Zheng Z, Gao M, Liang N, Wong KC. Chromothripsis detection with multiple myeloma patients based on deep graph learning. Bioinformatics 2023; 39:btad422. [PMID: 37399092 PMCID: PMC10343948 DOI: 10.1093/bioinformatics/btad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/20/2023] [Accepted: 06/30/2023] [Indexed: 07/05/2023] Open
Abstract
MOTIVATION Chromothripsis, associated with poor clinical outcomes, is prognostically vital in multiple myeloma. The catastrophic event is reported to be detectable prior to the progression of multiple myeloma. As a result, chromothripsis detection can contribute to risk estimation and early treatment guidelines for multiple myeloma patients. However, manual diagnosis remains the gold standard approach to detect chromothripsis events with the whole-genome sequencing technology to retrieve both copy number variation (CNV) and structural variation data. Meanwhile, CNV data are much easier to obtain than structural variation data. Hence, in order to reduce the reliance on human experts' efforts and structural variation data extraction, it is necessary to establish a reliable and accurate chromothripsis detection method based on CNV data. RESULTS To address those issues, we propose a method to detect chromothripsis solely based on CNV data. With the help of structure learning, the intrinsic relationship-directed acyclic graph of CNV features is inferred to derive a CNV embedding graph (i.e. CNV-DAG). Subsequently, a neural network based on Graph Transformer, local feature extraction, and non-linear feature interaction, is proposed with the embedding graph as the input to distinguish whether the chromothripsis event occurs. Ablation experiments, clustering, and feature importance analysis are also conducted to enable the proposed model to be explained by capturing mechanistic insights. AVAILABILITY AND IMPLEMENTATION The source code and data are freely available at https://github.com/luvyfdawnYu/CNV_chromothripsis.
Collapse
Affiliation(s)
- Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Ming Gao
- School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian 116025, China
| | - Ning Liang
- University of Michigan, Ann Arbor, MI 48105, United States
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen 518057, China
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| |
Collapse
|
32
|
Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, Wang B, Kong S, Li S, Cui Y, Lei C, Wang Y, Pan Y, Ma S, Sun H, Zhao X, Shi Y, Yang Z, Wu D, Wu S, Zhao X, Shi B, Jin L, Hu Z, Lu Y, Chu J, Ye K, Xu S. A pangenome reference of 36 Chinese populations. Nature 2023; 619:112-121. [PMID: 37316654 PMCID: PMC10322713 DOI: 10.1038/s41586-023-06173-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 05/05/2023] [Indexed: 06/16/2023]
Abstract
Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.
Collapse
Affiliation(s)
- Yang Gao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xinjiang Tan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Zhaoqing Yang
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China
| | - Lian Deng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Baonan Wang
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Shuang Kong
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Songyang Li
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Yuhang Cui
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Chang Lei
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Yimin Wang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yuwen Pan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Sen Ma
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hao Sun
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China
| | - Xiaohan Zhao
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Yingbing Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Ziyi Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China
| | - Xingming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Ministry of Education Key (MOE) Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science Fudan University, Shanghai, China
| | - Binyin Shi
- Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Zhibin Hu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China.
| | - Jiayou Chu
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China.
| | - Kai Ye
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China.
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China.
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
33
|
Zheng Y, Shang X. SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data. BMC Bioinformatics 2023; 24:213. [PMID: 37221476 DOI: 10.1186/s12859-023-05324-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 05/06/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Structural variations (SVs) refer to variations in an organism's chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. RESULT We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2-8% compared with the second-best method when the read depth is greater than 5×. More importantly, SVcnn has better performance for detecting multi-allelic SVs. CONCLUSIONS SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
34
|
Lin J, Jia P, Wang S, Kosters W, Ye K. Comparison and benchmark of structural variants detected from long read and long-read assembly. Brief Bioinform 2023:7169138. [PMID: 37200087 DOI: 10.1093/bib/bbad188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/20/2023] Open
Abstract
Structural variant (SV) detection is essential for genomic studies, and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well-curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long-read datasets, whereas variant type, size, and breakpoint detected by read-based strategy were greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4000 SVs, could be captured by both reads and assemblies. However, discordance between two strategies was largely caused by complex SVs and inversions, which resulted from inconsistent alignment of reads and assemblies at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, whereas assembly-based strategy is optional for applications with limited resources.
Collapse
Affiliation(s)
- Jiadong Lin
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061 China
- Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311 EZ, The Netherlands
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Walter Kosters
- Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311 EZ, The Netherlands
| | - Kai Ye
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061 China
- The School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 , The Netherlands
| |
Collapse
|
35
|
Sikic M. Facilitating genome structural variation analysis. Nat Methods 2023; 20:491-492. [PMID: 36959321 DOI: 10.1038/s41592-023-01767-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Affiliation(s)
- Mile Sikic
- Laboratory of AI in Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
| |
Collapse
|
36
|
Ma H, Zhong C, Chen D, He H, Yang F. cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network. BMC Bioinformatics 2023; 24:119. [PMID: 36977976 PMCID: PMC10045035 DOI: 10.1186/s12859-023-05243-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV . CONCLUSIONS The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.
Collapse
Affiliation(s)
- Huidong Ma
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China.
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China.
| | - Danyang Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Haofa He
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Feng Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| |
Collapse
|
37
|
Yang L, Yang Y, Huang L, Cui X, Liu Y. From single- to multi-omics: future research trends in medicinal plants. Brief Bioinform 2022; 24:6840072. [PMID: 36416120 PMCID: PMC9851310 DOI: 10.1093/bib/bbac485] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/25/2022] Open
Abstract
Medicinal plants are the main source of natural metabolites with specialised pharmacological activities and have been widely examined by plant researchers. Numerous omics studies of medicinal plants have been performed to identify molecular markers of species and functional genes controlling key biological traits, as well as to understand biosynthetic pathways of bioactive metabolites and the regulatory mechanisms of environmental responses. Omics technologies have been widely applied to medicinal plants, including as taxonomics, transcriptomics, metabolomics, proteomics, genomics, pangenomics, epigenomics and mutagenomics. However, because of the complex biological regulation network, single omics usually fail to explain the specific biological phenomena. In recent years, reports of integrated multi-omics studies of medicinal plants have increased. Until now, there have few assessments of recent developments and upcoming trends in omics studies of medicinal plants. We highlight recent developments in omics research of medicinal plants, summarise the typical bioinformatics resources available for analysing omics datasets, and discuss related future directions and challenges. This information facilitates further studies of medicinal plants, refinement of current approaches and leads to new ideas.
Collapse
Affiliation(s)
- Lifang Yang
- Kunming University of Science and Technology, China
| | - Ye Yang
- Kunming University of Science and Technology, China
| | - Luqi Huang
- the academician of the Chinese Academy of Engineering, studies the development of traditional Chinese medicine, Chinese Academy of Chinese Medical Sciences, China
| | - Xiuming Cui
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| | - Yuan Liu
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| |
Collapse
|