1
|
Cui Y, Peng C, Xia Z, Yang C, Guo Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol 2025; 26:138. [PMID: 40405275 PMCID: PMC12096488 DOI: 10.1186/s13059-025-03606-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 05/06/2025] [Indexed: 05/24/2025] Open
Abstract
A pangenome can reveal the genetic diversity across different individuals simultaneously. It offers a more comprehensive reference for genome analysis compared to a single linear genome that may introduce allele bias. Pangenomes are often represented as genome graphs, making sequence-to-graph mapping a fundamental task for pangenome construction and analysis. Numerous sequence-to-graph mapping algorithms have been developed over the past few years. Here, we provide a review of the advancements in sequence-to-graph mapping algorithms in the pangenome era. We also discuss the challenges and opportunities that arise in the context of pangenome graphs.
Collapse
Affiliation(s)
- Yingbo Cui
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China.
| | - Chenchen Peng
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
| | - Zeyu Xia
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
| | - Canqun Yang
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
- National Supercomputer Center in Tianjin, No.10 Xinhuan West Rd, 300457, Tianjin, People's Republic of China
| | - Yifei Guo
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China.
| |
Collapse
|
2
|
Marin MG, Quinones-Olvera N, Wippel C, Behruznia M, Jeffrey BM, Harris M, Mann BC, Rosenthal A, Jacobson KR, Warren RM, Li H, Meehan CJ, Farhat MR. Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species. Bioinformatics 2025; 41:btaf219. [PMID: 40341387 PMCID: PMC12119186 DOI: 10.1093/bioinformatics/btaf219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 12/31/2024] [Accepted: 05/07/2025] [Indexed: 05/10/2025] Open
Abstract
SUMMARY Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations. AVAILABILITY AND IMPLEMENTATION Panqc is freely available under an MIT license at https://github.com/maxgmarin/panqc.
Collapse
Affiliation(s)
- Maximillian G Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Natalia Quinones-Olvera
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Christoph Wippel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Mahboobeh Behruznia
- Department of Biosciences, Nottingham Trent University, Nottingham, NG1 4FQ, United Kingdom
| | - Brendan M Jeffrey
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, United States
| | - Michael Harris
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, United States
| | - Brendon C Mann
- Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Stellenbosch, Western Cape, 7602, South Africa
| | - Alex Rosenthal
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, United States
| | - Karen R Jacobson
- Division of Infectious Diseases, Chobanian & Avedisian School of Medicine, Boston University, Boston, MA 02118, United States
| | - Robin M Warren
- Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Stellenbosch, Western Cape, 7602, South Africa
| | - Heng Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, United States
| | - Conor J Meehan
- Department of Biosciences, Nottingham Trent University, Nottingham, NG1 4FQ, United Kingdom
- Unit of Mycobacteriology, Institute of Tropical Medicine, Antwerp, 2000, Belgium
| | - Maha R Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA 02114, United States
| |
Collapse
|
3
|
Lian Q, Zhang Y, Zhang J, Peng Z, Wang W, Du M, Li H, Zhang X, Cheng L, Du R, Zhou Z, Yang Z, Xin G, Pu Y, Feng Z, Wu Q, Xuanyuan G, Bai S, Hu R, Negrão S, Bryan GJ, Bachem CWB, Zhou Y, Zhang R, Shang Y, Huang S, Lin T, Qi J. A genomic variation map provides insights into potato evolution and key agronomic traits. MOLECULAR PLANT 2025; 18:570-589. [PMID: 39861948 DOI: 10.1016/j.molp.2025.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 12/07/2024] [Accepted: 01/22/2025] [Indexed: 01/27/2025]
Abstract
Hybrid potato breeding based on diploid inbred lines is transforming the way of genetic improvement of this staple food crop, which requires a deep understanding of potato domestication and differentiation. In the present study, we resequenced 314 diploid wild and landrace accessions to generate a variome map of 47,203,407 variants. Using the variome map, we discovered the reshaping of tuber transcriptome during potato domestication, characterized genome-wide differentiation between landrace groups Stenotomum and Phureja. We identified a jasmonic acid biosynthetic gene possibly affecting the tuber dormancy period. Genome-wide association studies revealed a UDP-glycosyltransferase gene for the biosynthesis of anti-nutritional steroidal glycoalkaloids (SGAs), and a Dehydration Responsive Element Binding (DREB) transcription factor conferring increased average tuber weight. In addition, genome similarity and group-specific SNP analyses indicated that tetraploid potatoes originated from the diploid Solanum tuberosum group Stenotomum. These findings shed light on the evolutionary trajectory of potato domestication and improvement, providing a solid foundation for advancing hybrid potato-breeding practices.
Collapse
Affiliation(s)
- Qun Lian
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Yingying Zhang
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Jinzhe Zhang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Zhen Peng
- College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
| | - Weilun Wang
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Miru Du
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Hongbo Li
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; Plant Breeding, Wageningen University & Research, P.O. Box 386, 6700 AJ Wageningen, the Netherlands
| | - Xinyan Zhang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Lin Cheng
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Ran Du
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zijian Zhou
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Zhenqiang Yang
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Guohui Xin
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Yuanyuan Pu
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Zhiwen Feng
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Qian Wu
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Guochao Xuanyuan
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Shunbuer Bai
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Rong Hu
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Sónia Negrão
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Glenn J Bryan
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Christian W B Bachem
- Plant Breeding, Wageningen University & Research, P.O. Box 386, 6700 AJ Wageningen, the Netherlands
| | - Yongfeng Zhou
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Ruofang Zhang
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Yi Shang
- Yunnan Key Laboratory of Potato Biology, CAAS-YNNU-YINMORE Joint Academy of Potato Sciences, Yunnan Normal University, Kunming, China
| | - Sanwen Huang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Tao Lin
- College of Horticulture, China Agricultural University, Beijing 100193, China.
| | - Jianjian Qi
- Inner Mongolia Potato Engineering and Technology Research Center, Key Laboratory of Herbage and Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China.
| |
Collapse
|
4
|
Li H, Li J, Li X, Li J, Chen D, Zhang Y, Yu Q, Yang F, Liu Y, Dai W, Sun Y, Li P, Schranz ME, Ma F, Zhao T. Genomic investigation of plant secondary metabolism: insights from synteny network analysis of oxidosqualene cyclase flanking genes. THE NEW PHYTOLOGIST 2025; 245:2150-2169. [PMID: 39731256 DOI: 10.1111/nph.20357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 11/28/2024] [Indexed: 12/29/2024]
Abstract
The clustered distribution of genes involved in metabolic pathways within the plant genome has garnered significant attention from researchers. By comparing and analyzing changes in the flanking regions of metabolic genes across a diverse array of species, we can enhance our understanding of the formation and distribution of biosynthetic gene clusters (BGCs). In this study, we have designed a workflow that uncovers and assesses conserved positional relationships between genes in various species by using synteny neighborhood networks (SNN). This workflow is then applied to the analysis of flanking genes associated with oxidosqualene cyclases (OSCs). The method allows for the recognition and comparison of homologous blocks with unique flanking genes accompanying different subfamilies of OSCs. The examination of the flanking genes of OSCs in 122 plant species revealed multiple genes with conserved positional relationships with OSCs in angiosperms. Specifically, the earliest adjacency of OSC genes and CYP716 genes first appeared in basal eudicots, and the nonrandom occurrence of CYP716 genes in the flanking region of OSC persists across different lineages of eudicots. Our study showed the substitution of genes in the flanking region of the OSC varies across different plant lineages, and our approach facilitates the investigation of flanking gene rearrangements in the formation of OSC-related BGCs.
Collapse
Affiliation(s)
- Haochen Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Jiale Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Xinchu Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Jialin Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Dan Chen
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Yangxin Zhang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Qiaoming Yu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Fan Yang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Yunxiao Liu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Weidong Dai
- Tea Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, Zhejiang, 310008, China
| | - Yaqiang Sun
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Pengmin Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - M Eric Schranz
- Biosystematics Group, Wageningen University and Research, 6708 PB, Wageningen, the Netherlands
| | - Fengwang Ma
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| | - Tao Zhao
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, 712100, China
| |
Collapse
|
5
|
Edwards SV, Fang B, Khost D, Kolyfetis GE, Cheek RG, DeRaad DA, Chen N, Fitzpatrick JW, McCormack JE, Funk WC, Ghalambor CK, Garrison E, Guarracino A, Li H, Sackton TB. Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.11.637762. [PMID: 39990470 PMCID: PMC11844517 DOI: 10.1101/2025.02.11.637762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Structural variants (SVs) are widespread in vertebrate genomes, yet their evolutionary dynamics remain poorly understood. Using 45 long-read de novo genome assemblies and pangenome tools, we analyze SVs within three closely related species of North American jays (Aphelocoma, scrub-jays) displaying a 60-fold range in effective population size. We find rapid evolution of genome architecture, including ~100 Mb variation in genome size driven by dynamic satellite landscapes with unexpectedly long (> 10 kb) repeat units and widespread variation in gene content, influencing gene expression. SVs exhibit slightly deleterious dynamics modulated by variant length and population size, with strong evidence of adaptive fixation only in large populations. Our results demonstrate how population size shapes the distribution of SVs and the importance of pangenomes to characterizing genomic diversity.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Danielle Khost
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| | - George E Kolyfetis
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Rebecca G Cheek
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Devon A DeRaad
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, 477 Hutchison Hall, Box 270211, Rochester, NY, 14627, USA
| | - John W Fitzpatrick
- Cornell Lab of Ornithology, Cornell University, 159 Sapsucker Woods Rd, Ithaca, NY, 14850, USA
| | - John E. McCormack
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - W. Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Cameron K Ghalambor
- Department of Biology, Norwegian University of Science and Technology, Høgskoleringen 5, Realfagbygget D1-137, Trondheim, 7491, Norway
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Mailstop: CLSB 11007, Boston, MA, 2215
| | - Timothy B Sackton
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| |
Collapse
|
6
|
Madorsky Rowdo FP, Martini R, Ackermann SE, Tang CP, Tranquille M, Irizarry A, Us I, Alawa O, Moyer JE, Sigouros M, Nguyen J, Assaad MA, Cheng E, Ginter PS, Manohar J, Stonaker B, Boateng R, Oppong JK, Adjei EK, Awuah B, Kyei I, Aitpillah FS, Adinku MO, Ankomah K, Osei-Bonsu EB, Gyan KK, Hoda S, Newman L, Mosquera JM, Sboner A, Elemento O, Dow LE, Davis MB, Martin ML. Kinome-Focused CRISPR-Cas9 Screens in African Ancestry Patient-Derived Breast Cancer Organoids Identify Essential Kinases and Synergy of EGFR and FGFR1 Inhibition. Cancer Res 2025; 85:551-566. [PMID: 39891928 PMCID: PMC11790258 DOI: 10.1158/0008-5472.can-24-0775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 08/10/2024] [Accepted: 11/20/2024] [Indexed: 02/03/2025]
Abstract
Precision medicine approaches to cancer treatment aim to exploit genomic alterations that are specific to individual patients to tailor therapeutic strategies. Yet, some targetable genes and pathways are essential for tumor cell viability even in the absence of direct genomic alterations. In underrepresented populations, the mutational landscape and determinants of response to existing therapies are poorly characterized because of limited inclusion in clinical trials and studies. One way to reveal tumor essential genes is with genetic screens. Most screens are conducted on cell lines that bear little resemblance to patient tumors, after years of culture under nonphysiologic conditions. To address this problem, we aimed to develop a CRISPR screening pipeline in three-dimensionally grown patient-derived tumor organoid (PDTO) models. A breast cancer PDTO biobank that focused on underrepresented populations, including West African patients, was established and used to conduct a negative-selection kinome-focused CRISPR screen to identify kinases essential for organoid growth and potential targets for combination therapy with EGFR or MEK inhibitors. The screen identified several previously unidentified kinase targets, and the combination of FGFR1 and EGFR inhibitors synergized to block organoid proliferation. Together, these data demonstrate the feasibility of CRISPR-based genetic screens in patient-derived tumor models, including PDTOs from underrepresented patients with cancer, and identify targets for cancer therapy. Significance: Generation of a breast cancer patient-derived tumor organoid biobank focused on underrepresented populations enabled kinome-focused CRISPR screening that identified essential kinases and potential targets for combination therapy with EGFR or MEK inhibitors. See related commentary by Trembath and Spanheimer, p. 407.
Collapse
Affiliation(s)
| | - Rachel Martini
- Department of Surgery, Weill Cornell Medicine, New York, NY, USA
- Institute of Translational Genomic Medicine, Morehouse School of Medicine, GA, USA
| | - Sarah E. Ackermann
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Colin P. Tang
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Marvel Tranquille
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Adriana Irizarry
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Ilkay Us
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Omar Alawa
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Jenna E. Moyer
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Michael Sigouros
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - John Nguyen
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Majd Al Assaad
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Esther Cheng
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Paula S. Ginter
- Department of Pathology, NYU Langone Hospital-Long Island, Mineola, NY, USA
| | - Jyothi Manohar
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Brian Stonaker
- Department of Surgery, Weill Cornell Medicine, New York, NY, USA
| | | | | | | | | | - Ishmael Kyei
- Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | | | | | | | | | - Kofi K. Gyan
- Department of Surgery, Weill Cornell Medicine, New York, NY, USA
| | - Syed Hoda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Lisa Newman
- Department of Surgery, Weill Cornell Medicine, New York, NY, USA
| | - Juan Miguel Mosquera
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Andrea Sboner
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Elemento
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Lukas E. Dow
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Department of Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY, USA
| | - Melissa B. Davis
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Surgery, Weill Cornell Medicine, New York, NY, USA
- Institute of Translational Genomic Medicine, Morehouse School of Medicine, GA, USA
| | - M. Laura Martin
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
7
|
Wagner J, Olson ND, McDaniel J, Harris L, Pinto BJ, Jáspez D, Muñoz-Barrera A, Rubio-Rodríguez LA, Lorenzo-Salazar JM, Flores C, Sahraeian SME, Narzisi G, Byrska-Bishop M, Evani US, Xiao C, Lake JA, Fontana P, Greenberg C, Freed D, Mootor MFE, Boutros PC, Murray L, Shafin K, Carroll A, Sedlazeck FJ, Wilson M, Zook JM. Small variant benchmark from a complete assembly of X and Y chromosomes. Nat Commun 2025; 16:497. [PMID: 39779690 PMCID: PMC11711550 DOI: 10.1038/s41467-024-55710-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025] Open
Abstract
The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Brendan J Pinto
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85281 USA - Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Luis A Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - José M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
- CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Instituto de Investigación Sanitaria de Canarias, Santa Cruz de Tenerife, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Las Palmas de Gran Canaria, Spain
| | | | | | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Peter Fontana
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD, USA
| | - Craig Greenberg
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD, USA
| | | | | | - Paul C Boutros
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | | | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Melissa Wilson
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA.
| |
Collapse
|
8
|
Li H. BWT construction and search at the terabase scale. Bioinformatics 2024; 40:btae717. [PMID: 39607778 PMCID: PMC11646566 DOI: 10.1093/bioinformatics/btae717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 11/06/2024] [Accepted: 11/26/2024] [Indexed: 11/30/2024] Open
Abstract
MOTIVATION Burrows-Wheeler Transform (BWT) is a common component in full-text indices. Initially developed for data compression, it is particularly powerful for encoding redundant sequences such as pangenome data. However, BWT construction is resource intensive and hard to be parallelized, and many methods for querying large full-text indices only report exact matches or their simple extensions. These limitations have hampered the biological applications of full-text indices. RESULTS We developed ropebwt3 for efficient BWT construction and query. Ropebwt3 indexed 320 assembled human genomes in 65 h and indexed 7.3 terabases of commonly studied bacterial assemblies in 26 days. This was achieved using up to 170 gigabytes of memory at the peak without working disk space. Ropebwt3 can find maximal exact matches and inexact alignments under affine-gap penalties, and can retrieve similar local haplotypes matching a query sequence. It demonstrates the feasibility of full-text indexing at the terabase scale. AVAILABILITY AND IMPLEMENTATION https://github.com/lh3/ropebwt3.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
| |
Collapse
|
9
|
Fang B, Edwards SV. Fitness consequences of structural variation inferred from a House Finch pangenome. Proc Natl Acad Sci U S A 2024; 121:e2409943121. [PMID: 39531493 PMCID: PMC11588099 DOI: 10.1073/pnas.2409943121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Genomic structural variants (SVs) play a crucial role in adaptive evolution, yet their average fitness effects and characterization with pangenome tools are understudied in wild animal populations. We constructed a pangenome for House Finches (Haemorhous mexicanus), a model for studies of host-pathogen coevolution, using long-read sequence data on 16 individuals (32 de novo-assembled haplotypes) and one outgroup. We identified 887,118 SVs larger than 50 base pairs, mostly (60%) involving repetitive elements, with reduced SV diversity in the eastern US as a result of its introduction by humans. The distribution of fitness effects of genome-wide SVs was estimated using maximum likelihood approaches and revealed that SVs in both coding and noncoding regions were on average more deleterious than smaller indels or single nucleotide polymorphisms. The reference-free pangenome facilitated identification of a > 10-My-old, 11-megabase-long pericentric inversion on chromosome 1. We found that the genotype frequencies of the inversion, estimated from 135 birds widely sampled temporally and geographically, increased steadily over the 25 y since House Finches were first exposed to the bacterial pathogen Mycoplasma gallisepticum and showed signatures of balancing selection, capturing genes related to immunity and telomerase activity. We also observed shorter telomeres in populations with a greater number of years exposure to Mycoplasma. Our study illustrates the utility of long-read sequencing and pangenome methods for understanding wild animal populations, estimating fitness effects of genome-wide SVs, and advancing our understanding of adaptive evolution through structural variation.
Collapse
Affiliation(s)
- Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| |
Collapse
|
10
|
Chandra G, Gibney D, Jain C. Haplotype-aware sequence alignment to pangenome graphs. Genome Res 2024; 34:1265-1275. [PMID: 39013594 PMCID: PMC11529843 DOI: 10.1101/gr.279143.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 06/24/2024] [Indexed: 07/18/2024]
Abstract
Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in [Formula: see text] time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than [Formula: see text] is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in [Formula: see text] time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than [Formula: see text] is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.
Collapse
Affiliation(s)
- Ghanshyam Chandra
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore Karnataka 560012, India
| | - Daniel Gibney
- Department of Computer Science, The University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore Karnataka 560012, India;
| |
Collapse
|