1
|
Elfman J, Goins L, Heller T, Singh S, Wang YH, Li H. Discovery of a polymorphic gene fusion via bottom-up chimeric RNA prediction. Nucleic Acids Res 2024; 52:4409-4421. [PMID: 38587197 PMCID: PMC11077074 DOI: 10.1093/nar/gkae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/27/2024] [Indexed: 04/09/2024] Open
Abstract
Gene fusions and their chimeric products are commonly linked with cancer. However, recent studies have found chimeric transcripts in non-cancer tissues and cell lines. Large-scale efforts to annotate structural variations have identified gene fusions capable of generating chimeric transcripts even in normal tissues. In this study, we present a bottom-up approach targeting population-specific chimeric RNAs, identifying 58 such instances in the GTEx cohort, including notable cases such as SUZ12P1-CRLF3, TFG-ADGRG7 and TRPM4-PPFIA3, which possess distinct patterns across different ancestry groups. We provide direct evidence for an additional 29 polymorphic chimeric RNAs with associated structural variants, revealing 13 novel rare structural variants. Additionally, we utilize the All of Us dataset and a large cohort of clinical samples to characterize the association of the SUZ12P1-CRLF3-causing variant with patient phenotypes. Our study showcases SUZ12P1-CRLF3 as a representative example, illustrating the identification of elusive structural variants by focusing on those producing population-specific fusion transcripts.
Collapse
Affiliation(s)
- Justin Elfman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Lynette Goins
- Department of Biological Sciences, Clemson University, Clemson, SC 29631, USA
| | - Tessa Heller
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Sandeep Singh
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Computational Toxicology Facility, CSIR-Indian Institute of Toxicology Research, Lucknow, 226001, Uttar Pradesh, India
| | - Yuh-Hwa Wang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Hui Li
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Pathology, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
2
|
Yang L, Yin H, Bai L, Yao W, Tao T, Zhao Q, Gao Y, Teng J, Xu Z, Lin Q, Diao S, Pan Z, Guan D, Li B, Zhou H, Zhou Z, Zhao F, Wang Q, Pan Y, Zhang Z, Li K, Fang L, Liu GE. Mapping and functional characterization of structural variation in 1060 pig genomes. Genome Biol 2024; 25:116. [PMID: 38715020 PMCID: PMC11075355 DOI: 10.1186/s13059-024-03253-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Structural variations (SVs) have significant impacts on complex phenotypes by rearranging large amounts of DNA sequence. RESULTS We present a comprehensive SV catalog based on the whole-genome sequence of 1060 pigs (Sus scrofa) representing 101 breeds, covering 9.6% of the pig genome. This catalog includes 42,487 deletions, 37,913 mobile element insertions, 3308 duplications, 1664 inversions, and 45,184 break ends. Estimates of breed ancestry and hybridization using genotyped SVs align well with those from single nucleotide polymorphisms. Geographically stratified deletions are observed, along with known duplications of the KIT gene, responsible for white coat color in European pigs. Additionally, we identify a recent SINE element insertion in MYO5A transcripts of European pigs, potentially influencing alternative splicing patterns and coat color alterations. Furthermore, a Yorkshire-specific copy number gain within ABCG2 is found, impacting chromatin interactions and gene expression across multiple tissues over a stretch of genomic region of ~200 kb. Preliminary investigations into SV's impact on gene expression and traits using the Pig Genotype-Tissue Expression (PigGTEx) data reveal SV associations with regulatory variants and gene-trait pairs. For instance, a 51-bp deletion is linked to the lead eQTL of the lipid metabolism regulating gene FADS3, whose expression in embryo may affect loin muscle area, as revealed by our transcriptome-wide association studies. CONCLUSIONS This SV catalog serves as a valuable resource for studying diversity, evolutionary history, and functional shaping of the pig genome by processes like domestication, trait-based breeding, and adaptive evolution.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Hongwei Yin
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lijing Bai
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Wenye Yao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Tan Tao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Qianyi Zhao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhiting Xu
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shuqi Diao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhangyuan Pan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Dailu Guan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Bingjie Li
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - Huaijun Zhou
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Zhongyin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Fuping Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Kui Li
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
3
|
Malamon JS, Farrell JJ, Xia LC, Dombroski BA, Das RG, Way J, Kuzma AB, Valladares O, Leung YY, Scanlon AJ, Lopez IAB, Brehony J, Worley KC, Zhang NR, Wang LS, Farrer LA, Schellenberg GD, Lee WP, Vardarajan BN. A comparative study of structural variant calling in WGS from Alzheimer's disease families. Life Sci Alliance 2024; 7:e202302181. [PMID: 38418088 PMCID: PMC10902710 DOI: 10.26508/lsa.202302181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 03/01/2024] Open
Abstract
Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.
Collapse
Affiliation(s)
- John S Malamon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John J Farrell
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Li Charlie Xia
- https://ror.org/03mtd9a03 Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Rueben G Das
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jessica Way
- Broad Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Amanda B Kuzma
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Allison J Scanlon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Irving Antonio Barrera Lopez
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jack Brehony
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kim C Worley
- https://ror.org/02pttbw34 Human Genome Sequencing Center, and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lindsay A Farrer
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Neurology and Ophthalmology, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Epidemiology and Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Badri N Vardarajan
- https://ror.org/01esghr10 Gertrude H. Sergievsky Center and Taub Institute of Aging Brain, Department of Neurology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
4
|
Schuetz RJ, Ceyhan D, Antoniou AA, Chaudhari BP, White P. CNVoyant: A Highly Performant and Explainable Multi-Classifier Machine Learning Approach for Determining the Clinical Significance of Copy Number Variants. Res Sq 2024:rs.3.rs-4308324. [PMID: 38746157 PMCID: PMC11092842 DOI: 10.21203/rs.3.rs-4308324/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The precise classification of copy number variants ( CNVs ) presents a significant challenge in genomic medicine, primarily due to the complex nature of CNVs and their diverse impact on genetic disorders. This complexity is compounded by the limitations of existing methods in accurately distinguishing between benign, uncertain, and pathogenic CNVs. Addressing this gap, we introduce CNVoyant, a machine learning-based multi-class framework designed to enhance the clinical significance classification of CNVs. Trained on a comprehensive dataset of 52,176 ClinVar entries across pathogenic, uncertain, and benign classifications, CNVoyant incorporates a broad spectrum of genomic features, including genome position, disease-gene annotations, dosage sensitivity, and conservation scores. Models to predict the clinical significance of copy number gains and losses were trained independently. Final models were selected after testing 29 machine learning architectures and 10,000 hyperparameter combinations each for deletions and duplications via 5-fold cross-validation. We validate the performance of the CNVoyant by leveraging a comprehensive set of 21,574 CNVs from the DECIPHER database, a highly regarded resource known for its extensive catalog of chromosomal imbalances linked to clinical outcomes. Compared to alternative approaches, CNVoyant shows marked improvements in precision-recall and ROC AUC metrics for binary pathogenic classifications while going one step further, offering multi-classification of clinical significance and corresponding SHAP explainability plots. This large-scale validation demonstrates CNVoyant's superior accuracy and underscores its potential to aid genomic researchers and clinical geneticists in interpreting the clinical implications of real CNVs.
Collapse
|
5
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. bioRxiv 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
6
|
Ding W, Li X, Zhang J, Ji M, Zhang M, Zhong X, Cao Y, Liu X, Li C, Xiao C, Wang J, Li T, Yu Q, Mo F, Zhang B, Qi J, Yang JC, Qi J, Tian L, Xu X, Peng Q, Zhou WZ, Liu Z, Fu A, Zhang X, Zhang JJ, Sun Y, Hu B, An NA, Zhang L, Li CY. Adaptive functions of structural variants in human brain development. Sci Adv 2024; 10:eadl4600. [PMID: 38579006 DOI: 10.1126/sciadv.adl4600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 03/01/2024] [Indexed: 04/07/2024]
Abstract
Quantifying the structural variants (SVs) in nonhuman primates could provide a niche to clarify the genetic backgrounds underlying human-specific traits, but such resource is largely lacking. Here, we report an accurate SV map in a population of 562 rhesus macaques, verified by in-house benchmarks of eight macaque genomes with long-read sequencing and another one with genome assembly. This map indicates stronger selective constrains on inversions at regulatory regions, suggesting a strategy for prioritizing them with the most important functions. Accordingly, we identified 75 human-specific inversions and prioritized them. The top-ranked inversions have substantially shaped the human transcriptome, through their dual effects of reconfiguring the ancestral genomic architecture and introducing regional mutation hotspots at the inverted regions. As a proof of concept, we linked APCDD1, located on one of these inversions and down-regulated specifically in humans, to neuronal maturation and cognitive ability. We thus highlight inversions in shaping the human uniqueness in brain development.
Collapse
Affiliation(s)
- Wanqiu Ding
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xiangshang Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mingjun Ji
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mengling Zhang
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Xiaoming Zhong
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Yong Cao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 119S Fourth Ring Rd W, Fengtai District, Beijing, China
| | - Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunqiong Li
- Chinese Institute for Brain Research, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jiaxin Wang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Ting Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qing Yu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Boya Zhang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jianhuan Qi
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jie-Chun Yang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Juntian Qi
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Lu Tian
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qi Peng
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhijin Liu
- College of Life Sciences, Capital Normal University, Beijing, China
| | - Aisi Fu
- Wuhan Dgensee Clinical Laboratory, Wuhan, China
| | - Xiuqin Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jian-Jun Zhang
- Shanxi Key Laboratory of Chinese Medicine Encephalopathy, National International Joint Research Center for Molecular Chinese Medicine, Shanxi University of Chinese Medicine, Jinzhong, China
| | - Yujie Sun
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
- Southwest United Graduate School, Kunming 650092, China
| |
Collapse
|
7
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
8
|
Hujoel MLA, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat Genet 2024; 56:569-578. [PMID: 38548989 PMCID: PMC11018521 DOI: 10.1038/s41588-024-01684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 02/08/2024] [Indexed: 04/09/2024]
Abstract
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Serinus Biosciences Inc., New York, NY, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
9
|
Chen Z, Finnell RH, Lei Y, Wang H. Progress and clinical prospect of genomic structural variants investigation. Sci Bull (Beijing) 2024; 69:705-708. [PMID: 38310047 DOI: 10.1016/j.scib.2024.01.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2024]
Affiliation(s)
- Zhongzhong Chen
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200062, China
| | - Richard H Finnell
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA; Departments of Molecular and Human Genetics and Medicine, Baylor College of Medicine, One Baylor Plaza, Houston 77030, USA
| | - Yunping Lei
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA.
| | - Hongyan Wang
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Key Laboratory of Metabolic Remodelling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai 200438, China; Children's Hospital of Fudan University, Shanghai 201102, China.
| |
Collapse
|
10
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. medRxiv 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
11
|
Subramanian K, Chopra M, Kahali B. Landscape of genomic structural variations in Indian population-based cohorts: Deeper insights into their prevalence and clinical relevance. HGG Adv 2024; 5:100285. [PMID: 38521976 PMCID: PMC11007539 DOI: 10.1016/j.xhgg.2024.100285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/13/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024] Open
Abstract
Structural variations (SV) are large (>50 base pairs) genomic rearrangements comprising deletions, duplications, insertions, inversions, and translocations. Studying SVs is important because they play active and critical roles in regulating gene expression, determining disease predispositions, and identifying population-specific differences among individuals of diverse ancestries. However, SV discoveries in the Indian population using whole-genome sequencing (WGS) have been limited. In this study, using short-read WGS having an average 42X depth of coverage, we identify and characterize 36,210 SVs from 529 individuals enrolled in population-based cohorts in India. These SVs include 24,574 deletions, 2,913 duplications, 8,710 insertions, and 13 inversions; 1.26% (456 out of 36,210) of the identified SVs can potentially impact the coding regions of genes. Furthermore, 56 of these SVs are highly intolerant to loss-of-function changes to the mapped genes, and five SVs impacting ADAMTS17, CCDC40, and RHCE are common in our study individuals. Seven rare SVs significantly impact dosage sensitivity of genes known to be associated with various clinical phenotypes. Most of the SVs in our study are rare and heterozygous. This fine-scale SV discovery in the underrepresented Indian population provides valuable insights that extend beyond Eurocentric human genetic studies.
Collapse
Affiliation(s)
- Krithika Subramanian
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India; Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - Mehak Chopra
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India
| | - Bratati Kahali
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India.
| |
Collapse
|
12
|
Budurlean L, Tukaramrao DB, Zhang L, Dovat S, Broach J. Integrating Optical Genome Mapping and Whole Genome Sequencing in Somatic Structural Variant Detection. J Pers Med 2024; 14:291. [PMID: 38541033 PMCID: PMC10971281 DOI: 10.3390/jpm14030291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/01/2024] [Accepted: 03/07/2024] [Indexed: 04/10/2024] Open
Abstract
Structural variants drive tumorigenesis by disrupting normal gene function through insertions, inversions, translocations, and copy number changes, including deletions and duplications. Detecting structural variants is crucial for revealing their roles in tumor development, clinical outcomes, and personalized therapy. Presently, most studies rely on short-read data from next-generation sequencing that aligns back to a reference genome to determine if and, if so, where a structural variant occurs. However, structural variant discovery by short-read sequencing is challenging, primarily because of the difficulty in mapping regions of repetitive sequences. Optical genome mapping (OGM) is a recent technology used for imaging and assembling long DNA strands to detect structural variations. To capture the structural variant landscape more thoroughly in the human genome, we developed an integrated pipeline that combines Bionano OGM and Illumina whole-genome sequencing and applied it to samples from 29 pediatric B-ALL patients. The addition of OGM allowed us to identify 511 deletions, 506 insertions, 93 duplications/gains, and 145 translocations that were otherwise missed in the short-read data. Moreover, we identified several novel gene fusions, the expression of which was confirmed by RNA sequencing. Our results highlight the benefit of integrating OGM and short-read detection methods to obtain a comprehensive analysis of genetic variation that can aid in clinical diagnosis, provide new therapeutic targets, and improve personalized medicine in cancers driven by structural variation.
Collapse
Affiliation(s)
- Laura Budurlean
- Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
| | | | - Lijun Zhang
- Department of Population & Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Sinisa Dovat
- Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
- Department of Pediatrics, Penn State Cancer Institute, Hershey, PA 17033, USA
| | - James Broach
- Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
| |
Collapse
|
13
|
Berdan EL, Aubier TG, Cozzolino S, Faria R, Feder JL, Giménez MD, Joron M, Searle JB, Mérot C. Structural Variants and Speciation: Multiple Processes at Play. Cold Spring Harb Perspect Biol 2024; 16:a041446. [PMID: 38052499 PMCID: PMC10910405 DOI: 10.1101/cshperspect.a041446] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Research on the genomic architecture of speciation has increasingly revealed the importance of structural variants (SVs) that affect the presence, abundance, position, and/or direction of a nucleotide sequence. SVs include large chromosomal rearrangements such as fusion/fissions and inversions and translocations, as well as smaller variants such as duplications, insertions, and deletions (CNVs). Although we have ample evidence that SVs play a key role in speciation, the underlying mechanisms differ depending on the type and length of the SV, as well as the ecological, demographic, and historical context. We review predictions and empirical evidence for classic processes such as underdominance due to meiotic aberrations and the coupling effect of recombination suppression before exploring how recent sequencing methodologies illuminate the prevalence and diversity of SVs. We discuss specific properties of SVs and their impact throughout the genome, highlighting that multiple processes are at play, and possibly interacting, in the relationship between SVs and speciation.
Collapse
Affiliation(s)
- Emma L Berdan
- Department of Marine Sciences, Gothenburg University, Gothenburg 40530, Sweden
- Bioinformatics Core, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Thomas G Aubier
- Laboratoire Évolution & Diversité Biologique, Université Paul Sabatier Toulouse III, UMR 5174, CNRS/IRD, 31077 Toulouse, France
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Salvatore Cozzolino
- Department of Biology, University of Naples Federico II, Complesso Universitario di Monte S. Angelo, 80126 Napoli, Italia
| | - Rui Faria
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO, Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, 4485-661 Vairão, Portugal
| | - Jeffrey L Feder
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana 46556, USA
| | - Mabel D Giménez
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Instituto de Genética Humana de Misiones (IGeHM), Parque de la Salud de la Provincia de Misiones "Dr. Ramón Madariaga," N3300KAZ Posadas, Misiones, Argentina
- Facultad de Ciencias Exactas, Químicas y Naturales, Universidad Nacional de Misiones, N3300LQH Posadas, Misiones, Argentina
| | - Mathieu Joron
- Centre d'Ecologie Fonctionnelle et Evolutive, Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Jeremy B Searle
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York 14853, USA
| | - Claire Mérot
- CNRS, UMR 6553 Ecobio, OSUR, Université de Rennes, 35000 Rennes, France
| |
Collapse
|
14
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
15
|
Liu X, Hu F, Zhang D, Li Z, He J, Zhang S, Wang Z, Zhao Y, Wu J, Liu C, Li C, Li X, Wu J. Whole genome sequencing enables new genetic diagnosis for inherited retinal diseases by identifying pathogenic variants. NPJ Genom Med 2024; 9:6. [PMID: 38245557 PMCID: PMC10799956 DOI: 10.1038/s41525-024-00391-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 12/19/2023] [Indexed: 01/22/2024] Open
Abstract
Inherited retinal diseases (IRDs) are a group of common primary retinal degenerative disorders. Conventional genetic testing strategies, such as panel-based sequencing and whole exome sequencing (WES), can only elucidate the genetic etiology in approximately 60% of IRD patients. Studies have suggested that unsolved IRD cases could be attributed to previously undetected structural variants (SVs) and intronic variants in IRD-related genes. The aim of our study was to obtain a definitive genetic diagnosis by employing whole genome sequencing (WGS) in IRD cases where the causative genes were inconclusive following an initial screening by panel sequencing. A total of 271 unresolved IRD patients and their available family members (n = 646) were screened using WGS to identify pathogenic SVs and intronic variants in 792 known ocular disease genes. Overall, 13% (34/271) of IRD patients received a confirmed genetic diagnosis, among which 7% were exclusively attributed to SVs, 4% to a combination of single nucleotide variants (SNVs) and SVs while another 2% were linked to intronic variants. 22 SVs, 3 deep-intronic variants, and 2 non-canonical splice-site variants across 14 IRD genes were identified in the entire cohort. Notably, all of these detected SVs and intronic variants were novel pathogenic variants. Among those, 74% (20/27) of variants were found in genes causally linked to Retinitis Pigmentosa (RP), with the gene EYS being the most frequently affected by SVs. The identification of SVs and intronic variants through WGS enhances the genetic diagnostic yield of IRDs and broadens the mutational spectrum of known IRD-associated genes.
Collapse
Affiliation(s)
- Xubing Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyuan Hu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Daowei Zhang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Zhe Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jianquan He
- Computer Center, Eye & ENT Hospital, Fudan University, Shanghai, China
| | - Shenghai Zhang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Zhenguo Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yingke Zhao
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Jiawen Wu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Chen Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Chenchen Li
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Xin Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Jihong Wu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China.
- NHC Key Laboratory of Myopia (Fudan University); Key Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China.
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China.
| |
Collapse
|
16
|
Bailey SM, Cross EM, Kinner-Bibeau L, Sebesta HC, Bedford JS, Tompkins CJ. Monitoring Genomic Structural Rearrangements Resulting from Gene Editing. J Pers Med 2024; 14:110. [PMID: 38276232 PMCID: PMC10817574 DOI: 10.3390/jpm14010110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/27/2024] Open
Abstract
The cytogenomics-based methodology of directional genomic hybridization (dGH) enables the detection and quantification of a more comprehensive spectrum of genomic structural variants than any other approach currently available, and importantly, does so on a single-cell basis. Thus, dGH is well-suited for testing and/or validating new advancements in CRISPR-Cas9 gene editing systems. In addition to aberrations detected by traditional cytogenetic approaches, the strand specificity of dGH facilitates detection of otherwise cryptic intra-chromosomal rearrangements, specifically small inversions. As such, dGH represents a powerful, high-resolution approach for the quantitative monitoring of potentially detrimental genomic structural rearrangements resulting from exposure to agents that induce DNA double-strand breaks (DSBs), including restriction endonucleases and ionizing radiations. For intentional genome editing strategies, it is critical that any undesired effects of DSBs induced either by the editing system itself or by mis-repair with other endogenous DSBs are recognized and minimized. In this paper, we discuss the application of dGH for assessing gene editing-associated structural variants and the potential heterogeneity of such rearrangements among cells within an edited population, highlighting its relevance to personalized medicine strategies.
Collapse
Affiliation(s)
- Susan M. Bailey
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Erin M. Cross
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | | - Henry C. Sebesta
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Joel S. Bedford
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | |
Collapse
|
17
|
Benfica LF, Brito LF, do Bem RD, Mulim HA, Glessner J, Braga LG, Gloria LS, Cyrillo JNSG, Bonilha SFM, Mercadante MEZ. Genome-wide association study between copy number variation and feeding behavior, feed efficiency, and growth traits in Nellore cattle. BMC Genomics 2024; 25:54. [PMID: 38212678 PMCID: PMC10785391 DOI: 10.1186/s12864-024-09976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 01/04/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Feeding costs represent the largest expenditures in beef production. Therefore, the animal efficiency in converting feed in high-quality protein for human consumption plays a major role in the environmental impact of the beef industry and in the beef producers' profitability. In this context, breeding animals for improved feed efficiency through genomic selection has been considered as a strategic practice in modern breeding programs around the world. Copy number variation (CNV) is a less-studied source of genetic variation that can contribute to phenotypic variability in complex traits. In this context, this study aimed to: (1) identify CNV and CNV regions (CNVRs) in the genome of Nellore cattle (Bos taurus indicus); (2) assess potential associations between the identified CNVR and weaning weight (W210), body weight measured at the time of selection (WSel), average daily gain (ADG), dry matter intake (DMI), residual feed intake (RFI), time spent at the feed bunk (TF), and frequency of visits to the feed bunk (FF); and, (3) perform functional enrichment analyses of the significant CNVR identified for each of the traits evaluated. RESULTS A total of 3,161 CNVs and 561 CNVRs ranging from 4,973 bp to 3,215,394 bp were identified. The CNVRs covered up to 99,221,894 bp (3.99%) of the Nellore autosomal genome. Seventeen CNVR were significantly associated with dry matter intake and feeding frequency (number of daily visits to the feed bunk). The functional annotation of the associated CNVRs revealed important candidate genes related to metabolism that may be associated with the phenotypic expression of the evaluated traits. Furthermore, Gene Ontology (GO) analyses revealed 19 enrichment processes associated with FF. CONCLUSIONS A total of 3,161 CNVs and 561 CNVRs were identified and characterized in a Nellore cattle population. Various CNVRs were significantly associated with DMI and FF, indicating that CNVs play an important role in key biological pathways and in the phenotypic expression of feeding behavior and growth traits in Nellore cattle.
Collapse
Affiliation(s)
- Lorena F Benfica
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA.
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil.
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | - Ricardo D do Bem
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil
| | - Henrique A Mulim
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | - Joseph Glessner
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Larissa G Braga
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | | | | | | |
Collapse
|
18
|
Auwerx C, Jõeloo M, Sadler MC, Tesio N, Ojavee S, Clark CJ, Mägi R, Reymond A, Kutalik Z. Rare copy-number variants as modulators of common disease susceptibility. Genome Med 2024; 16:5. [PMID: 38185688 PMCID: PMC10773105 DOI: 10.1186/s13073-023-01265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024] Open
Abstract
BACKGROUND Copy-number variations (CNVs) have been associated with rare and debilitating genomic disorders (GDs) but their impact on health later in life in the general population remains poorly described. METHODS Assessing four modes of CNV action, we performed genome-wide association scans (GWASs) between the copy-number of CNV-proxy probes and 60 curated ICD-10 based clinical diagnoses in 331,522 unrelated white British UK Biobank (UKBB) participants with replication in the Estonian Biobank. RESULTS We identified 73 signals involving 40 diseases, all of which indicating that CNVs increased disease risk and caused earlier onset. We estimated that 16% of these associations are indirect, acting by increasing body mass index (BMI). Signals mapped to 45 unique, non-overlapping regions, nine of which being linked to known GDs. Number and identity of genes affected by CNVs modulated their pathogenicity, with many associations being supported by colocalization with both common and rare single-nucleotide variant association signals. Dissection of association signals provided insights into the epidemiology of known gene-disease pairs (e.g., deletions in BRCA1 and LDLR increased risk for ovarian cancer and ischemic heart disease, respectively), clarified dosage mechanisms of action (e.g., both increased and decreased dosage of 17q12 impacted renal health), and identified putative causal genes (e.g., ABCC6 for kidney stones). Characterization of the pleiotropic pathological consequences of recurrent CNVs at 15q13, 16p13.11, 16p12.2, and 22q11.2 in adulthood indicated variable expressivity of these regions and the involvement of multiple genes. Finally, we show that while the total burden of rare CNVs-and especially deletions-strongly associated with disease risk, it only accounted for ~ 0.02% of the UKBB disease burden. These associations are mainly driven by CNVs at known GD CNV regions, whose pleiotropic effect on common diseases was broader than anticipated by our CNV-GWAS. CONCLUSIONS Our results shed light on the prominent role of rare CNVs in determining common disease susceptibility within the general population and provide actionable insights for anticipating later-onset comorbidities in carriers of recurrent CNVs.
Collapse
Affiliation(s)
- Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| | - Maarja Jõeloo
- Institute of Molecular and Cell Biology, University of Tartu, 51010, Tartu, Estonia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Marie C Sadler
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland
| | - Nicolò Tesio
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Sven Ojavee
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Charlie J Clark
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| |
Collapse
|
19
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. bioRxiv 2024:2024.01.02.573821. [PMID: 38260545 PMCID: PMC10802302 DOI: 10.1101/2024.01.02.573821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, TX, USA
| |
Collapse
|
20
|
Fair T, Pavlovic BJ, Schaefer NK, Pollen AA. Mapping cis- and trans-regulatory target genes of human-specific deletions. bioRxiv 2023:2023.12.27.573461. [PMID: 38234800 PMCID: PMC10793408 DOI: 10.1101/2023.12.27.573461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Deletion of functional sequence is predicted to represent a fundamental mechanism of molecular evolution1,2. Comparative genetic studies of primates2,3 have identified thousands of human-specific deletions (hDels), and the cis-regulatory potential of short (≤31 base pairs) hDels has been assessed using reporter assays4. However, how structural variant-sized (≥50 base pairs) hDels influence molecular and cellular processes in their native genomic contexts remains unexplored. Here, we design genome-scale libraries of single-guide RNAs targeting 7.2 megabases of sequence in 6,358 hDels and present a systematic CRISPR interference (CRISPRi) screening approach to identify hDels that modify cellular proliferation in chimpanzee pluripotent stem cells. By intersecting hDels with chromatin state features and performing single-cell CRISPRi (Perturb-seq) to identify their cis- and trans-regulatory target genes, we discovered 19 hDels controlling gene expression. We highlight two hDels, hDel_2247 and hDel_585, with tissue-specific activity in the liver and brain, respectively. Our findings reveal a molecular and cellular role for sequences lost in the human lineage and establish a framework for functionally interrogating human-specific genetic variants.
Collapse
Affiliation(s)
- Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Bryan J Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Nathan K Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
21
|
Schmitz D, Li Z, Lo Faro V, Rask-Andersen M, Ameur A, Rafati N, Johansson Å. Copy number variations and their effect on the plasma proteome. Genetics 2023; 225:iyad179. [PMID: 37793096 PMCID: PMC10697815 DOI: 10.1093/genetics/iyad179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 08/25/2023] [Accepted: 09/15/2023] [Indexed: 10/06/2023] Open
Abstract
Structural variations, including copy number variations (CNVs), affect around 20 million bases in the human genome and are common causes of rare conditions. CNVs are rarely investigated in complex disease research because most CNVs are not targeted on the genotyping arrays or the reference panels for genetic imputation. In this study, we characterize CNVs in a Swedish cohort (N = 1,021) using short-read whole-genome sequencing (WGS) and use long-read WGS for validation in a subcohort (N = 15), and explore their effect on 438 plasma proteins. We detected 184,182 polymorphic CNVs and identified 15 CNVs to be associated with 16 proteins (P < 8.22×10-10). Of these, 5 CNVs could be perfectly validated using long-read sequencing, including a CNV which was associated with measurements of the osteoclast-associated immunoglobulin-like receptor (OSCAR) and located upstream of OSCAR, a gene important for bone health. Two other CNVs were identified to be clusters of many short repetitive elements and another represented a complex rearrangement including an inversion. Our findings provide insights into the structure of common CNVs and their effects on the plasma proteome, and highlights the importance of investigating common CNVs, also in relation to complex diseases.
Collapse
Affiliation(s)
- Daniel Schmitz
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Zhiwei Li
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Valeria Lo Faro
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Mathias Rask-Andersen
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Nima Rafati
- Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Box 582, 751 23 Uppsala, Sweden
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| |
Collapse
|
22
|
Kore H, Datta KK, Nagaraj SH, Gowda H. Protein-coding potential of non-canonical open reading frames in human transcriptome. Biochem Biophys Res Commun 2023; 684:149040. [PMID: 37897910 DOI: 10.1016/j.bbrc.2023.09.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/09/2023] [Accepted: 09/23/2023] [Indexed: 10/30/2023]
Abstract
In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation. A subset of them have been functionally characterized and shown to play an important role in fundamental biological processes including cardiac and muscle function, DNA repair, embryonic development and various human diseases. How many novel protein-coding regions exist in the human genome and what fraction of them are functionally important remains a mystery. In this review, we discuss current progress in unraveling SEPs, approaches used for their identification, their limitations and reliability of these identifications. We also discuss functionally characterized SEPs and their involvement in various biological processes and diseases. Lastly, we provide insights into their distinctive features compared to canonical proteins and challenges associated with annotating these in protein reference databases.
Collapse
Affiliation(s)
- Hitesh Kore
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia.
| | - Keshava K Datta
- Proteomics and Metabolomics Platform, La Trobe University, Melbourne, VIC, 3083, Australia
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia
| | - Harsha Gowda
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Medicine, The University of Queensland, Queensland, 4072, Australia.
| |
Collapse
|
23
|
Kyriazis CC, Robinson JA, Lohmueller KE. Using Computational Simulations to Model Deleterious Variation and Genetic Load in Natural Populations. Am Nat 2023; 202:737-752. [PMID: 38033186 PMCID: PMC10897732 DOI: 10.1086/726736] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
AbstractDeleterious genetic variation is abundant in wild populations, and understanding the ecological and conservation implications of such variation is an area of active research. Genomic methods are increasingly used to quantify the impacts of deleterious variation in natural populations; however, these approaches remain limited by an inability to accurately predict the selective and dominance effects of mutations. Computational simulations of deleterious variation offer a complementary tool that can help overcome these limitations, although such approaches have yet to be widely employed. In this perspective article, we aim to encourage ecological and conservation genomics researchers to adopt greater use of computational simulations to aid in deepening our understanding of deleterious variation in natural populations. We first provide an overview of the components of a simulation of deleterious variation, describing the key parameters involved in such models. Next, we discuss several approaches for validating simulation models. Finally, we compare and validate several recently proposed deleterious mutation models, demonstrating that models based on estimates of selection parameters from experimental systems are biased toward highly deleterious mutations. We describe a new model that is supported by multiple orthogonal lines of evidence and provide example scripts for implementing this model (https://github.com/ckyriazis/simulations_review).
Collapse
|
24
|
Sopic M, Vilne B, Gerdts E, Trindade F, Uchida S, Khatib S, Wettinger SB, Devaux Y, Magni P. Multiomics tools for improved atherosclerotic cardiovascular disease management. Trends Mol Med 2023; 29:983-995. [PMID: 37806854 DOI: 10.1016/j.molmed.2023.09.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/10/2023]
Abstract
Multiomics studies offer accurate preventive and therapeutic strategies for atherosclerotic cardiovascular disease (ASCVD) beyond traditional risk factors. By using artificial intelligence (AI) and machine learning (ML) approaches, it is possible to integrate multiple 'omics and clinical data sets into tools that can be utilized for the development of personalized diagnostic and therapeutic approaches. However, currently multiple challenges in data quality, integration, and privacy still need to be addressed. In this opinion, we emphasize that joined efforts, exemplified by the AtheroNET COST Action, have a pivotal role in overcoming the challenges to advance multiomics approaches in ASCVD research, with the aim to foster more precise and effective patient care.
Collapse
Affiliation(s)
- Miron Sopic
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg; Department of Medical Biochemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, 11000, Serbia
| | - Baiba Vilne
- Bioinformatics Laboratory, Rīga Stradiņš University, Rīga, LV-1007, Latvia
| | - Eva Gerdts
- Center for Research on Cardiac Disease in Women, Department of Clinical Science, University of Bergen, Bergen, 5020, Norway
| | - Fábio Trindade
- Cardiovascular R&D Centre - UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto, 4099-002, Portugal
| | - Shizuka Uchida
- Center for RNA Medicine, Department of Clinical Medicine, Aalborg University, Copenhagen, SV, DK-2450, Denmark
| | - Soliman Khatib
- Natural Compounds and Analytical Chemistry Laboratory, MIGAL-Galilee Research Institute, Kiryat Shemona, 11016, Israel; Department of Biotechnology, Tel-Hai College, Upper Galilee 12210, Israel
| | - Stephanie Bezzina Wettinger
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, 2080, Malta
| | - Yvan Devaux
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg.
| | - Paolo Magni
- Department of Pharmacological and Biomolecular Sciences 'Rodolfo Paoletti', Università degli Studi di Milano, Via G. Balzaretti 9, 20133 Milano, Italy; IRCCS MultiMedica, Via Milanese 300, 20099 Sesto S. Giovanni, Milan, Italy.
| |
Collapse
|
25
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
26
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
27
|
Paparella A, L’Abbate A, Palmisano D, Chirico G, Porubsky D, Catacchio CR, Ventura M, Eichler EE, Maggiolini FAM, Antonacci F. Structural Variation Evolution at the 15q11-q13 Disease-Associated Locus. Int J Mol Sci 2023; 24:15818. [PMID: 37958807 PMCID: PMC10648317 DOI: 10.3390/ijms242115818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader-Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements.
Collapse
Affiliation(s)
- Annalisa Paparella
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Alberto L’Abbate
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnology (IBIOM), 70125 Bari, Italy
| | - Donato Palmisano
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Gerardina Chirico
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Claudia R. Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute (HHMI), University of Washington, Seattle, WA 98195, USA
| | - Flavia A. M. Maggiolini
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
- Research Centre for Viticulture and Enology, Council for Agricultural Research and Economics (CREA), 70010 Bari, Italy
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| |
Collapse
|
28
|
Brown BC, Morris JA, Lappalainen T, Knowles DA. Large-scale causal discovery using interventional data sheds light on the regulatory network architecture of blood traits. bioRxiv 2023:2023.10.13.562293. [PMID: 37905013 PMCID: PMC10614812 DOI: 10.1101/2023.10.13.562293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Inference of directed biological networks is an important but notoriously challenging problem. We introduce inverse sparse regression (inspre), an approach to learning causal networks that leverages large-scale intervention-response data. Applied to 788 genes from the genome-wide perturb-seq dataset, inspre helps elucidate the network architecture of blood traits.
Collapse
Affiliation(s)
- Brielin C. Brown
- New York Genome Center, New York, NY, USA
- Data Science Institute, Columbia University, New York, NY, USA
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY
| | - David A. Knowles
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| |
Collapse
|
29
|
Abdi M, Aliyev E, Trost B, Kohailan M, Aamer W, Syed N, Shaath R, Gandhi GD, Engchuan W, Howe J, Thiruvahindrapuram B, Geng M, Whitney J, Syed A, Lakshmi J, Hussein S, Albashir N, Hussein A, Poggiolini I, Elhag SF, Palaniswamy S, Kambouris M, de Fatima Janjua M, Tahir MOE, Nazeer A, Shahwar D, Azeem MW, Mokrab Y, Aati NA, Akil A, Scherer SW, Kamal M, Fakhro KA. Genomic architecture of autism spectrum disorder in Qatar: The BARAKA-Qatar Study. Genome Med 2023; 15:81. [PMID: 37805537 PMCID: PMC10560429 DOI: 10.1186/s13073-023-01228-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 09/04/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by impaired social and communication skills, restricted interests, and repetitive behaviors. The prevalence of ASD among children in Qatar was recently estimated to be 1.1%, though the genetic architecture underlying ASD both in Qatar and the greater Middle East has been largely unexplored. Here, we describe the first genomic data release from the BARAKA-Qatar Study-a nationwide program building a broadly consented biorepository of individuals with ASD and their families available for sample and data sharing and multi-omics research. METHODS In this first release, we present a comprehensive analysis of whole-genome sequencing (WGS) data of the first 100 families (372 individuals), investigating the genetic architecture, including single-nucleotide variants (SNVs), copy number variants (CNVs), tandem repeat expansions (TREs), as well as mitochondrial DNA variants (mtDNA) segregating with ASD in local families. RESULTS Overall, we identify potentially pathogenic variants in known genes or regions in 27 out of 100 families (27%), of which 11 variants (40.7%) were classified as pathogenic or likely-pathogenic based on American College of Medical Genetics (ACMG) guidelines. Dominant variants, including de novo and inherited, contributed to 15 (55.6%) of these families, consisting of SNVs/indels (66.7%), CNVs (13.3%), TREs (13.3%), and mtDNA variants (6.7%). Moreover, homozygous variants were found in 7 families (25.9%), with a sixfold increase in homozygous burden in consanguineous versus non-consanguineous families (13.6% and 1.8%, respectively). Furthermore, 28 novel ASD candidate genes were identified in 20 families, 23 of which had recurrent hits in MSSNG and SSC cohorts. CONCLUSIONS This study illustrates the value of ASD studies in under-represented populations and the importance of WGS as a comprehensive tool for establishing a molecular diagnosis for families with ASD. Moreover, it uncovers a significant role for recessive variation in ASD architecture in consanguineous settings and provides a unique resource of Middle Eastern genomes for future research to the global ASD community.
Collapse
Affiliation(s)
- Mona Abdi
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | - Elbay Aliyev
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | - Brett Trost
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | | | - Waleed Aamer
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | - Najeeb Syed
- Genomics Data Science Core, Sidra Medicine, Doha, Qatar
| | - Rulan Shaath
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | | | - Worrawat Engchuan
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Jennifer Howe
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Melissa Geng
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Joe Whitney
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Amira Syed
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | | | - Sura Hussein
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | | | - Amal Hussein
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | | | - Saba F Elhag
- Department of Genetics, Sidra Medicine, Doha, Qatar
- Hamad Medical Corporation, Doha, Qatar
| | | | - Marios Kambouris
- Pathology and Laboratory Medicine Department, Genetics Division, Sidra Medicine, Doha, Qatar
| | | | | | - Ahsan Nazeer
- Department of Psychiatry, Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine, Doha, Qatar
| | - Durre Shahwar
- Department of Psychiatry, Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine, Doha, Qatar
| | - Muhammad Waqar Azeem
- Department of Psychiatry, Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine, Doha, Qatar
| | - Younes Mokrab
- Department of Genetics, Sidra Medicine, Doha, Qatar
- Department of Genetic Medicine, Weill Cornell Medicine, Doha, Qatar
- Qatar University, Doha, Qatar
| | | | - Ammira Akil
- Department of Genetics, Sidra Medicine, Doha, Qatar
| | - Stephen W Scherer
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- McLaughlin Centre, University of Toronto, Toronto, ON, Canada
| | - Madeeha Kamal
- Department of Pediatrics, Sidra Medicine, Doha, Qatar
| | - Khalid A Fakhro
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar.
- Department of Genetics, Sidra Medicine, Doha, Qatar.
- Department of Genetic Medicine, Weill Cornell Medicine, Doha, Qatar.
| |
Collapse
|
30
|
Lee YL, Bouwman AC, Harland C, Bosse M, Costa Monteiro Moreira G, Veerkamp RF, Mullaart E, Cambisano N, Groenen MAM, Karim L, Coppieters W, Georges M, Charlier C. The rate of de novo structural variation is increased in in vitro-produced offspring and preferentially affects the paternal genome. Genome Res 2023; 33:1455-1464. [PMID: 37793781 PMCID: PMC10620045 DOI: 10.1101/gr.277884.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 08/08/2023] [Indexed: 10/06/2023]
Abstract
Assisted reproductive technologies (ARTs), including in vitro maturation and fertilization (IVF), are increasingly used in human and animal reproduction. Whether these technologies directly affect the rate of de novo mutation (DNM), and to what extent, has been a matter of debate. Here we take advantage of domestic cattle, characterized by complex pedigrees that are ideally suited to detect DNMs and by the systematic use of ART, to study the rate of de novo structural variation (dnSV) in this species and how it is impacted by IVF. By exploiting features of associated de novo point mutations (dnPMs) and dnSVs in clustered DNMs, we provide strong evidence that (1) IVF increases the rate of dnSV approximately fivefold, and (2) the corresponding mutations occur during the very early stages of embryonic development (one- and two-cell stage), yet primarily affect the paternal genome.
Collapse
Affiliation(s)
- Young-Lim Lee
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Aniek C Bouwman
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Chad Harland
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- Livestock Improvement Corporation, Hamilton 3240, New Zealand
| | - Mirte Bosse
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Nadine Cambisano
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Martien A M Groenen
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Latifa Karim
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| | - Carole Charlier
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| |
Collapse
|
31
|
Lee H, Greer SU, Pavlichin DS, Zhou B, Urban AE, Weissman T, Ji HP. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell Rep Methods 2023; 3:100543. [PMID: 37671027 PMCID: PMC10475782 DOI: 10.1016/j.crmeth.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/14/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023]
Abstract
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dmitri S. Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
32
|
Alibutud R, Hansali S, Cao X, Zhou A, Mahaganapathy V, Azaro M, Gwin C, Wilson S, Buyske S, Bartlett CW, Flax JF, Brzustowicz LM, Xing J. Structural Variations Contribute to the Genetic Etiology of Autism Spectrum Disorder and Language Impairments. Int J Mol Sci 2023; 24:13248. [PMID: 37686052 PMCID: PMC10487745 DOI: 10.3390/ijms241713248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 08/24/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by restrictive interests and/or repetitive behaviors and deficits in social interaction and communication. ASD is a multifactorial disease with a complex polygenic genetic architecture. Its genetic contributing factors are not yet fully understood, especially large structural variations (SVs). In this study, we aimed to assess the contribution of SVs, including copy number variants (CNVs), insertions, deletions, duplications, and mobile element insertions, to ASD and related language impairments in the New Jersey Language and Autism Genetics Study (NJLAGS) cohort. Within the cohort, ~77% of the families contain SVs that followed expected segregation or de novo patterns and passed our filtering criteria. These SVs affected 344 brain-expressed genes and can potentially contribute to the genetic etiology of the disorders. Gene Ontology and protein-protein interaction network analysis suggested several clusters of genes in different functional categories, such as neuronal development and histone modification machinery. Genes and biological processes identified in this study contribute to the understanding of ASD and related neurodevelopment disorders.
Collapse
Affiliation(s)
- Rohan Alibutud
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Sammy Hansali
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Anbo Zhou
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Vaidhyanathan Mahaganapathy
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Marco Azaro
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Christine Gwin
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Sherri Wilson
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Steven Buyske
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA;
| | - Christopher W. Bartlett
- The Steve Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
- Department of Pediatrics, College of Medicine, The Ohio State University, Columbus, OH 43205, USA
| | - Judy F. Flax
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Linda M. Brzustowicz
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
- The Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
- The Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
33
|
Zhang X, Brody JA, Graff M, Highland HM, Chami N, Xu H, Wang Z, Ferrier K, Chittoor G, Josyula NS, Li X, Li Z, Allison MA, Becker DM, Bielak LF, Bis JC, Boorgula MP, Bowden DW, Broome JG, Buth EJ, Carlson CS, Chang KM, Chavan S, Chiu YF, Chuang LM, Conomos MP, DeMeo DL, Du M, Duggirala R, Eng C, Fohner AE, Freedman BI, Garrett ME, Guo X, Haiman C, Heavner BD, Hidalgo B, Hixson JE, Ho YL, Hobbs BD, Hu D, Hui Q, Hwu CM, Jackson RD, Jain D, Kalyani RR, Kardia SL, Kelly TN, Lange EM, LeNoir M, Li C, Marchand LL, McDonald MLN, McHugh CP, Morrison AC, Naseri T, O’Connell J, O’Donnell CJ, Palmer ND, Pankow JS, Perry JA, Peters U, Preuss MH, Rao D, Regan EA, Reupena SM, Roden DM, Rodriguez-Santana J, Sitlani CM, Smith JA, Tiwari HK, Vasan RS, Wang Z, Weeks DE, Wessel J, Wiggins KL, Wilkens LR, Wilson PW, Yanek LR, Yoneda ZT, Zhao W, Zöllner S, Arnett DK, Ashley-Koch AE, Barnes KC, Blangero J, Boerwinkle E, Burchard EG, Carson AP, Chasman DI, Chen YDI, Curran JE, Fornage M, Gordeuk VR, He J, Heckbert SR, Hou L, Irvin MR, Kooperberg C, Minster RL, Mitchell BD, Nouraie M, Psaty BM, Raffield LM, Reiner AP, Rich SS, Rotter JI, Shoemaker MB, Smith NL, Taylor KD, Telen MJ, Weiss ST, Zhang Y, Heard-Costa N, Sun YV, Lin X, Adrienne Cupples L, Lange LA, Liu CT, Loos RJ, North KE, Justice AE. WHOLE GENOME SEQUENCING ANALYSIS OF BODY MASS INDEX IDENTIFIES NOVEL AFRICAN ANCESTRY-SPECIFIC RISK ALLELE. medRxiv 2023:2023.08.21.23293271. [PMID: 37662265 PMCID: PMC10473809 DOI: 10.1101/2023.08.21.23293271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10-9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.
Collapse
Affiliation(s)
- Xinruo Zhang
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Heather M. Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hanfei Xu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kendra Ferrier
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | | | | | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Matthew A. Allison
- Department of Family Medicine, Division of Preventive Medicine, The University of California San Diego, La Jolla, CA, USA
| | - Diane M. Becker
- Department of Medicine, General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lawrence F. Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | | | - Donald W. Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jai G. Broome
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Erin J. Buth
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Christopher S. Carlson
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Kyong-Mi Chang
- The Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Sameer Chavan
- Department of Medicine, School of Medicine, University of Colorado, Aurora, CO, USA
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Taipei, Taiwan
| | - Lee-Ming Chuang
- Department of Internal Medicine, Division of Metabolism/Endocrinology, National Taiwan University Hospital, Taipei, Taiwan
| | - Matthew P. Conomos
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Dawn L. DeMeo
- Department of Medicine, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Margaret Du
- Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ravindranath Duggirala
- Life Sciences, College of Arts and Sciences, Texas A&M University-San Antonio, San Antonio, TX, USA
| | - Celeste Eng
- Department of Medicine, Lung Biology Center, University of California, San Francisco, San Francisco, CA, USA
| | - Alison E. Fohner
- Epidemiology, Institute of Public Health Genetics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Barry I. Freedman
- Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Melanie E. Garrett
- Department of Medicine, Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Xiuqing Guo
- Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Chris Haiman
- Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Benjamin D. Heavner
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Bertha Hidalgo
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - James E. Hixson
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yuk-Lam Ho
- Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Brian D. Hobbs
- Department of Medicine, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Donglei Hu
- Department of Medicine, Lung Biology Center, University of California, San Francisco, San Francisco, CA, USA
| | - Qin Hui
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
- Atlanta VA Health Care System, Decatur, GA, USA
| | - Chii-Min Hwu
- Department of Medicine, Division of Endocrinology and Metabolism, Taipei Veterans General Hospital, Taipei, Taiwan, Taiwan
| | | | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Rita R. Kalyani
- Department of Medicine, Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tanika N. Kelly
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Ethan M. Lange
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Michael LeNoir
- Department of Pediatrics, Bay Area Pediatrics, Oakland, CA, USA
| | - Changwei Li
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Loic Le. Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Merry-Lynn N. McDonald
- Department of Medicine, Pulmonary, Allergy and Critical Care, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Caitlin P. McHugh
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Alanna C. Morrison
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | | | - Jeffrey O’Connell
- Department of Medicine, Program for Personalized and Genomic Medicine, University of Maryland, Baltimore, MD, USA
| | - Christopher J. O’Donnell
- Veterans Affairs Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - James S. Pankow
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - James A. Perry
- Department of Medicine, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - D.C. Rao
- Division of Biostatistics, Washington University in St. Louis, St. Louis, MO, USA
| | - Elizabeth A. Regan
- Department of Medicine, Rheumatology, National Jewish Health, Denver, CO, USA
| | | | - Dan M. Roden
- Medicine, Pharmacology, and Biomedical Informatics, Clinical Pharmacology and Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Colleen M. Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K. Tiwari
- Department of Biostatistics, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | | | - Zeyuan Wang
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Daniel E. Weeks
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jennifer Wessel
- Department of Epidemiology, Indiana University, Indianapolis, IN, USA
- Department of Medicine, Indiana University, Indianapolis, IN, USA
- Diabaetes Translational Research Center, Indiana University, Indianapolis, IN, USA
| | - Kerri L. Wiggins
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lynne R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Peter W.F. Wilson
- Atlanta VA Health Care System, Decatur, GA, USA
- Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Lisa R. Yanek
- Department of Medicine, General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Zachary T. Yoneda
- Department of Medicine, Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Sebastian Zöllner
- Department of Biostatistics, Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Donna K. Arnett
- Department of Epidemiology, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Allison E. Ashley-Koch
- Department of Medicine, Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Kathleen C. Barnes
- Department of Medicine, School of Medicine, University of Colorado, Aurora, CO, USA
| | - John Blangero
- Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Esteban G. Burchard
- Bioengineering and Therapeutic Sciences and Medicine, Lung Biology Center, University of California, San Francisco, San Francisco, CA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi, Jackson, MI, USA
| | - Daniel I. Chasman
- Division of Preventive Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Yii-Der Ida Chen
- Department of Medical Genetics, Genomic Outcomes, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Myriam Fornage
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Victor R. Gordeuk
- Department of Medicine, School of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Jiang He
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Susan R. Heckbert
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Lifang Hou
- Northwestern University, Chicago, IL, USA
| | - Marguerite R. Irvin
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ryan L. Minster
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes and Nutrition, University of Maryland, Baltimore, MD, USA
| | - Mehdi Nouraie
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Stephen S. Rich
- Public Health Science, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I. Rotter
- Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - M. Benjamin Shoemaker
- Department of Medicine, Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nicholas L. Smith
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
- Seattle Epidemiologic Research and Information Center, Office of Research and Development, Department of Veterans Affairs, Seattle, WA, USA
| | - Kent D. Taylor
- Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Marilyn J. Telen
- Department of Medicine, Hematology, Duke University Medical Center, Durham, NC, USA
| | - Scott T. Weiss
- Department of Medicine, Channing Division of Network Medicine, Harvard Medical School, Boston, MA, USA
| | - Yingze Zhang
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nancy Heard-Costa
- Framingham Heart Study, School of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Yan V. Sun
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
- Atlanta VA Health Care System, Decatur, GA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Boston, MA, USA
| | - L. Adrienne Cupples
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Leslie A. Lange
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Ching-Ti Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Science, University of Copenhagen, Copenhagen, Denmark
| | - Kari E. North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
34
|
Zhou B, He Y, Chen Y, Su B. Comparative Genomic Analysis Identifies Great-Ape-Specific Structural Variants and Their Evolutionary Relevance. Mol Biol Evol 2023; 40:msad184. [PMID: 37565562 PMCID: PMC10461412 DOI: 10.1093/molbev/msad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/01/2023] [Accepted: 08/10/2023] [Indexed: 08/12/2023] Open
Abstract
During the origin of great apes about 14 million years ago, a series of phenotypic innovations emerged, such as the increased body size, the enlarged brain volume, the improved cognitive skill, and the diversified diet. Yet, the genomic basis of these evolutionary changes remains unclear. Utilizing the high-quality genome assemblies of great apes (including human), gibbon, and macaque, we conducted comparative genome analyses and identified 15,885 great ape-specific structural variants (GSSVs), including eight coding GSSVs resulting in the creation of novel proteins (e.g., ACAN and CMYA5). Functional annotations of the GSSV-related genes revealed the enrichment of genes involved in development and morphogenesis, especially neurogenesis and neural network formation, suggesting the potential role of GSSVs in shaping the great ape-shared traits. Further dissection of the brain-related GSSVs shows great ape-specific changes of enhancer activities and gene expression in the brain, involving a group of GSSV-regulated genes (such as NOL3) that potentially contribute to the altered brain development and function in great apes. The presented data highlight the evolutionary role of structural variants in the phenotypic innovations during the origin of the great ape lineage.
Collapse
Affiliation(s)
- Bin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yongjie Chen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
35
|
Montanucci L, Lewis-Smith D, Collins RL, Niestroj LM, Parthasarathy S, Xian J, Ganesan S, Macnee M, Brünger T, Thomas RH, Talkowski M, Helbig I, Leu C, Lal D. Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals. Nat Commun 2023; 14:4392. [PMID: 37474567 PMCID: PMC10359300 DOI: 10.1038/s41467-023-39539-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 06/16/2023] [Indexed: 07/22/2023] Open
Abstract
Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA
| | | | - Shridhar Parthasarathy
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Julie Xian
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shiva Ganesan
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Marie Macnee
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Tobias Brünger
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Michael Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA.
- Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College London, London, UK.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, US.
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA.
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, US.
| |
Collapse
|
36
|
Audano PA, Beck CR. Small allelic variants are a source of ancestral bias in structural variant breakpoint placement. bioRxiv 2023:2023.06.25.546295. [PMID: 37425850 PMCID: PMC10327140 DOI: 10.1101/2023.06.25.546295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| |
Collapse
|
37
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, Terao C. Detection of trait-associated structural variations using short-read sequencing. Cell Genom 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
38
|
Kaivola K, Chia R, Ding J, Rasheed M, Fujita M, Menon V, Walton RL, Collins RL, Billingsley K, Brand H, Talkowski M, Zhao X, Dewan R, Stark A, Ray A, Solaiman S, Alvarez Jerez P, Malik L, Dawson TM, Rosenthal LS, Albert MS, Pletnikova O, Troncoso JC, Masellis M, Keith J, Black SE, Ferrucci L, Resnick SM, Tanaka T, Topol E, Torkamani A, Tienari P, Foroud TM, Ghetti B, Landers JE, Ryten M, Morris HR, Hardy JA, Mazzini L, D'Alfonso S, Moglia C, Calvo A, Serrano GE, Beach TG, Ferman T, Graff-Radford NR, Boeve BF, Wszolek ZK, Dickson DW, Chiò A, Bennett DA, De Jager PL, Ross OA, Dalgard CL, Gibbs JR, Traynor BJ, Scholz SW. Genome-wide structural variant analysis identifies risk loci for non-Alzheimer's dementias. Cell Genom 2023; 3:100316. [PMID: 37388914 PMCID: PMC10300553 DOI: 10.1016/j.xgen.2023.100316] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 07/01/2023]
Abstract
We characterized the role of structural variants, a largely unexplored type of genetic variation, in two non-Alzheimer's dementias, namely Lewy body dementia (LBD) and frontotemporal dementia (FTD)/amyotrophic lateral sclerosis (ALS). To do this, we applied an advanced structural variant calling pipeline (GATK-SV) to short-read whole-genome sequence data from 5,213 European-ancestry cases and 4,132 controls. We discovered, replicated, and validated a deletion in TPCN1 as a novel risk locus for LBD and detected the known structural variants at the C9orf72 and MAPT loci as associated with FTD/ALS. We also identified rare pathogenic structural variants in both LBD and FTD/ALS. Finally, we assembled a catalog of structural variants that can be mined for new insights into the pathogenesis of these understudied forms of dementia.
Collapse
Affiliation(s)
- Karri Kaivola
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Ruth Chia
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Jinhui Ding
- Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Memoona Rasheed
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Masashi Fujita
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, New York, NY, USA
| | - Vilas Menon
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, New York, NY, USA
| | - Ronald L. Walton
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
| | - Ryan L. Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Kimberley Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Centre for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Michael Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
| | - Ramita Dewan
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Ali Stark
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Anindita Ray
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Sultana Solaiman
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Centre for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Laksh Malik
- Centre for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Ted M. Dawson
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Neuroregeneration and Stem Cell Programs, Institute of Cell Engineering, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Department of Pharmacology and Molecular Science, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Liana S. Rosenthal
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Marilyn S. Albert
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Olga Pletnikova
- Department of Pathology and Anatomical Sciences, Jacobs School of Medicine and Biomedical Sciences, University of Buffalo, Buffalo, NY, USA
- Department of Pathology (Neuropathology), Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Juan C. Troncoso
- Department of Pathology (Neuropathology), Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Mario Masellis
- Cognitive & Movement Disorders Clinic, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Division of Neurology, Department of Medicine, University of Toronto, Toronto, ON, Canada
- Hurvitz Brain Sciences Research Program, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
- LC Campbell Cognitive Neurology Research Unit, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
| | - Julia Keith
- Department of Anatomical Pathology, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
| | - Sandra E. Black
- Division of Neurology, Department of Medicine, University of Toronto, Toronto, ON, Canada
- Hurvitz Brain Sciences Research Program, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
- LC Campbell Cognitive Neurology Research Unit, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
- Institute of Medical Science, Faculty of Medicine, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Heart and Stroke Foundation Canadian Partnership for Stroke Recovery, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
| | - Luigi Ferrucci
- Longitudinal Studies Section, National Institute on Aging, Baltimore, MD, USA
| | - Susan M. Resnick
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, MD, USA
| | - Toshiko Tanaka
- Longitudinal Studies Section, National Institute on Aging, Baltimore, MD, USA
| | - PROSPECT Consortium
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, New York, NY, USA
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Centre for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Neuroregeneration and Stem Cell Programs, Institute of Cell Engineering, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Department of Pharmacology and Molecular Science, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University Medical Center, Baltimore, MD, USA
- Department of Pathology and Anatomical Sciences, Jacobs School of Medicine and Biomedical Sciences, University of Buffalo, Buffalo, NY, USA
- Department of Pathology (Neuropathology), Johns Hopkins University Medical Center, Baltimore, MD, USA
- Cognitive & Movement Disorders Clinic, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Division of Neurology, Department of Medicine, University of Toronto, Toronto, ON, Canada
- Hurvitz Brain Sciences Research Program, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
- LC Campbell Cognitive Neurology Research Unit, Sunnybrook Research Institute, University of Toronto, 2075 Bayview Avenue, Toronto, ON, Canada
- Department of Anatomical Pathology, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Institute of Medical Science, Faculty of Medicine, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Heart and Stroke Foundation Canadian Partnership for Stroke Recovery, Sunnybrook Health Sciences Centre, University of Toronto, 1 King’s College Circle, Room 2374, Toronto, ON, Canada
- Longitudinal Studies Section, National Institute on Aging, Baltimore, MD, USA
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, MD, USA
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA
- Translational Immunology, Research Programs Unit, University of Helsinki, Helsinki, Finland
- Department of Neurology, Helsinki University Hospital, Helsinki, Finland
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Neurology, University of Massachusetts Medical School, Worcester, MA, USA
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, University College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
- UK Dementia Research Institute, Department of Neurogenerative Disease and Reta Lila Weston Institute, London, UK
- Institute of Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- Maggiore della Carita University Hospital, Novara, Italy
- Department of Health Sciences, University of Eastern Piedmont, Novara, Italy
- Rita Levi Montalcini Department of Neuroscience, University of Turin, Turin, Italy
- Azienda Ospedaliero Universitaria Città, della Salute e della Scienza, Corso Bramante, 88, Turin, Italy
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ, USA
- Department of Psychiatry and Psychology, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
- Department of Neurology, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
- Center for Sleep Medicine, Mayo Clinic, Rochester, MN, USA
- Institute of Cognitive Sciences and Technologies, C.N.R., Via S. Martino della Battaglia, 44, Rome, Italy
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Anatomy, Physiology and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- RNA Therapeutics Laboratory, Therapeutics Development Branch, National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Eric Topol
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA
| | - Ali Torkamani
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA
| | - Pentti Tienari
- Translational Immunology, Research Programs Unit, University of Helsinki, Helsinki, Finland
- Department of Neurology, Helsinki University Hospital, Helsinki, Finland
| | - Tatiana M. Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Bernardino Ghetti
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - John E. Landers
- Department of Neurology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Mina Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, University College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| | - Huw R. Morris
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
| | - John A. Hardy
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
- UK Dementia Research Institute, Department of Neurogenerative Disease and Reta Lila Weston Institute, London, UK
- Institute of Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont, Novara, Italy
| | - Cristina Moglia
- Rita Levi Montalcini Department of Neuroscience, University of Turin, Turin, Italy
- Azienda Ospedaliero Universitaria Città, della Salute e della Scienza, Corso Bramante, 88, Turin, Italy
| | - Andrea Calvo
- Rita Levi Montalcini Department of Neuroscience, University of Turin, Turin, Italy
- Azienda Ospedaliero Universitaria Città, della Salute e della Scienza, Corso Bramante, 88, Turin, Italy
| | - Geidy E. Serrano
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ, USA
| | - Thomas G. Beach
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ, USA
| | - Tanis Ferman
- Department of Psychiatry and Psychology, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
| | | | | | - Zbigniew K. Wszolek
- Institute of Cognitive Sciences and Technologies, C.N.R., Via S. Martino della Battaglia, 44, Rome, Italy
| | - Dennis W. Dickson
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
| | - Adriano Chiò
- Rita Levi Montalcini Department of Neuroscience, University of Turin, Turin, Italy
- Azienda Ospedaliero Universitaria Città, della Salute e della Scienza, Corso Bramante, 88, Turin, Italy
- Institute of Cognitive Sciences and Technologies, C.N.R., Via S. Martino della Battaglia, 44, Rome, Italy
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, New York, NY, USA
| | - Owen A. Ross
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL, USA
| | - Clifton L. Dalgard
- Department of Anatomy, Physiology and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - J. Raphael Gibbs
- Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Bryan J. Traynor
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
- RNA Therapeutics Laboratory, Therapeutics Development Branch, National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| |
Collapse
|
39
|
Hujoel ML, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Hidden protein-altering variants influence diverse human phenotypes. bioRxiv 2023:2023.06.07.544066. [PMID: 37333244 PMCID: PMC10274781 DOI: 10.1101/2023.06.07.544066] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Structural variants (SVs) comprise the largest genetic variants, altering from 50 base pairs to megabases of DNA. However, SVs have not been effectively ascertained in most genetic association studies, leaving a key gap in our understanding of human complex trait genetics. We ascertained protein-altering SVs from UK Biobank whole-exome sequencing data (n=468,570) using haplotype-informed methods capable of detecting sub-exonic SVs and variation within segmental duplications. Incorporating SVs into analyses of rare variants predicted to cause gene loss-of-function (pLoF) identified 100 associations of pLoF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 appeared to confer one of the strongest protective effects of gene LoF on hypertension risk (OR = 0.86 [0.82-0.90]). Protein-coding variation in rapidly-evolving gene families within segmental duplications-previously invisible to most analysis methods-appeared to generate some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype, and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L.A. Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Robert E. Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard University, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A. Sherman
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R. Barton
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ronen E. Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A. McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard University, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
40
|
Rajaby R, Liu DX, Au CH, Cheung YT, Lau AYT, Yang QY, Sung WK. INSurVeyor: improving insertion calling from short read sequencing data. Nat Commun 2023; 14:3243. [PMID: 37277343 DOI: 10.1038/s41467-023-38870-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 05/18/2023] [Indexed: 06/07/2023] Open
Abstract
Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.
Collapse
Affiliation(s)
- Ramesh Rajaby
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore
| | - Dong-Xu Liu
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Chun Hang Au
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Yuen-Ting Cheung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Amy Yuet Ting Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Qing-Yong Yang
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wing-Kin Sung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China.
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore.
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| |
Collapse
|
41
|
Haas BJ, Dobin A, Ghandi M, Van Arsdale A, Tickle T, Robinson JT, Gillani R, Kasif S, Regev A. Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. Cell Rep Methods 2023; 3:100467. [PMID: 37323575 PMCID: PMC10261907 DOI: 10.1016/j.crmeth.2023.100467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 02/28/2023] [Accepted: 04/14/2023] [Indexed: 06/17/2023]
Abstract
Here, we present FusionInspector for in silico characterization and interpretation of candidate fusion transcripts from RNA sequencing (RNA-seq) and exploration of their sequence and expression characteristics. We applied FusionInspector to thousands of tumor and normal transcriptomes and identified statistical and experimental features enriched among biologically impactful fusions. Through clustering and machine learning, we identified large collections of fusions potentially relevant to tumor and normal biological processes. We show that biologically relevant fusions are enriched for relatively high expression of the fusion transcript, imbalanced fusion allelic ratios, and canonical splicing patterns, and are deficient in sequence microhomologies between partner genes. We demonstrate that FusionInspector accurately validates fusion transcripts in silico and helps characterize numerous understudied fusions in tumor and normal tissue samples. FusionInspector is freely available as open source for screening, characterization, and visualization of candidate fusions via RNA-seq, and facilitates transparent explanation and interpretation of machine-learning predictions and their experimental sources.
Collapse
Affiliation(s)
- Brian J. Haas
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | | | - Anne Van Arsdale
- Department of Obstetrics and Gynecology and Women’s Health, Albert Einstein Montefiore Medical Center, Bronx, NY 10461, USA
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James T. Robinson
- School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
- Boston Children’s Hospital, Boston, MA 02115, USA
| | - Simon Kasif
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
42
|
Kojima S, Koyama S, Ka M, Saito Y, Parrish EH, Endo M, Takata S, Mizukoshi M, Hikino K, Takeda A, Gelinas AF, Heaton SM, Koide R, Kamada AJ, Noguchi M, Hamada M, Kamatani Y, Murakawa Y, Ishigaki K, Nakamura Y, Ito K, Terao C, Momozawa Y, Parrish NF. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat Genet 2023:10.1038/s41588-023-01390-2. [PMID: 37169872 DOI: 10.1038/s41588-023-01390-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 04/04/2023] [Indexed: 05/13/2023]
Abstract
Mobile genetic elements (MEs) are heritable mutagens that recursively generate structural variants (SVs). ME variants (MEVs) are difficult to genotype and integrate in statistical genetics, obscuring their impact on genome diversification and traits. We developed a tool that accurately genotypes MEVs using short-read whole-genome sequencing (WGS) and applied it to global human populations. We find unexpected population-specific MEV differences, including an Alu insertion distribution distinguishing Japanese from other populations. Integrating MEVs with expression quantitative trait loci (eQTL) maps shows that MEV classes regulate tissue-specific gene expression by shared mechanisms, including creating or attenuating enhancers and recruiting post-transcriptional regulators, supporting class-wide interpretability. MEVs more often associate with gene expression changes than SNVs, thus plausibly impacting traits. Performing genome-wide association study (GWAS) with MEVs pinpoints potential causes of disease risk, including a LINE-1 insertion associated with keloid and fasciitis. This work implicates MEVs as drivers of human divergence and disease risk.
Collapse
Affiliation(s)
- Shohei Kojima
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan.
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Mirei Ka
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Next-Generation Precision Medicine Development, Integrative Genomics Laboratory, Graduate School of Medicine, Department of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yuka Saito
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
| | - Erica H Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Mikiko Endo
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Sadaaki Takata
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Misaki Mizukoshi
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Keiko Hikino
- Laboratory for Pharmacogenomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Takeda
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Asami F Gelinas
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Steven M Heaton
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Rie Koide
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Anselmo J Kamada
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Paleovirology Lab, Department of Biology, University of Oxford, Oxford, UK
| | - Michiya Noguchi
- Cell Engineering Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yasuhiro Murakawa
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan
- IFOM ETS - the AIRC Institute of Molecular Oncology, Milan, Italy
| | - Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yukio Nakamura
- Cell Engineering Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Nicholas F Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan.
| |
Collapse
|
43
|
Gong Y, Li Y, Liu X, Ma Y, Jiang L. A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals? J Anim Sci Biotechnol 2023; 14:73. [PMID: 37143156 PMCID: PMC10161434 DOI: 10.1186/s40104-023-00860-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 05/06/2023] Open
Abstract
As large-scale genomic studies have progressed, it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level. While domestic animals tend to have complex routes of origin and migration, suggesting a possible omission of some population-specific sequences in the current reference genome. Conversely, the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals (core genome) and is also able to display sequence information unique to each individual (variable genome). The progress of pangenome research in humans, plants and domestic animals has proved that the missing genetic components and the identification of large structural variants (SVs) can be explored through pangenomic studies. Many individual specific sequences have been shown to be related to biological adaptability, phenotype and important economic traits. The maturity of technologies and methods such as third-generation sequencing, Telomere-to-telomere genomes, graphic genomes, and reference-free assembly will further promote the development of pangenome. In the future, pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals, providing better insights into animal domestication, evolution and breeding. In this review, we mainly discuss how pangenome analysis reveals genetic variations in domestic animals (sheep, cattle, pigs, chickens) and their impacts on phenotypes and how this can contribute to the understanding of species diversity. Additionally, we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.
Collapse
Affiliation(s)
- Ying Gong
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
| | - Yefang Li
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
| | - Xuexue Liu
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- Centre d'Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, 37 allées Jules Guesde, Toulouse, 31000, France
| | - Yuehui Ma
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
| | - Lin Jiang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
| |
Collapse
|
44
|
Lee YL, Bosse M, Takeda H, Moreira GCM, Karim L, Druet T, Oget-Ebrad C, Coppieters W, Veerkamp RF, Groenen MAM, Georges M, Bouwman AC, Charlier C. High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data. BMC Genomics 2023; 24:225. [PMID: 37127590 PMCID: PMC10152703 DOI: 10.1186/s12864-023-09259-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 03/20/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios). RESULTS We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array. CONCLUSION We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.
Collapse
Affiliation(s)
- Young-Lim Lee
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands.
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium.
| | - Mirte Bosse
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Haruko Takeda
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | | | - Latifa Karim
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Claire Oget-Ebrad
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Michel Georges
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Carole Charlier
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| |
Collapse
|
45
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 170] [Impact Index Per Article: 170.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
46
|
Comaills V, Castellano-Pozo M. Chromosomal Instability in Genome Evolution: From Cancer to Macroevolution. Biology (Basel) 2023; 12:biology12050671. [PMID: 37237485 DOI: 10.3390/biology12050671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023]
Abstract
The integrity of the genome is crucial for the survival of all living organisms. However, genomes need to adapt to survive certain pressures, and for this purpose use several mechanisms to diversify. Chromosomal instability (CIN) is one of the main mechanisms leading to the creation of genomic heterogeneity by altering the number of chromosomes and changing their structures. In this review, we will discuss the different chromosomal patterns and changes observed in speciation, in evolutional biology as well as during tumor progression. By nature, the human genome shows an induction of diversity during gametogenesis but as well during tumorigenesis that can conclude in drastic changes such as the whole genome doubling to more discrete changes as the complex chromosomal rearrangement chromothripsis. More importantly, changes observed during speciation are strikingly similar to the genomic evolution observed during tumor progression and resistance to therapy. The different origins of CIN will be treated as the importance of double-strand breaks (DSBs) or the consequences of micronuclei. We will also explain the mechanisms behind the controlled DSBs, and recombination of homologous chromosomes observed during meiosis, to explain how errors lead to similar patterns observed during tumorigenesis. Then, we will also list several diseases associated with CIN, resulting in fertility issues, miscarriage, rare genetic diseases, and cancer. Understanding better chromosomal instability as a whole is primordial for the understanding of mechanisms leading to tumor progression.
Collapse
Affiliation(s)
- Valentine Comaills
- Andalusian Center for Molecular Biology and Regenerative Medicine-CABIMER, University of Pablo de Olavide-University of Seville-CSIC, Junta de Andalucía, 41092 Seville, Spain
| | - Maikel Castellano-Pozo
- Andalusian Center for Molecular Biology and Regenerative Medicine-CABIMER, University of Pablo de Olavide-University of Seville-CSIC, Junta de Andalucía, 41092 Seville, Spain
- Genetic Department, Faculty of Biology, University of Seville, 41080 Seville, Spain
| |
Collapse
|
47
|
Steensma MJ, Lee YL, Bouwman AC, Pita Barros C, Derks MFL, Bink MCAM, Harlizius B, Huisman AE, Crooijmans RPMA, Groenen MAM, Mulder HA, Rochus CM. Identification and characterisation of de novo germline structural variants in two commercial pig lines using trio-based whole genome sequencing. BMC Genomics 2023; 24:208. [PMID: 37072725 PMCID: PMC10114323 DOI: 10.1186/s12864-023-09296-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/04/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND De novo mutations arising in the germline are a source of genetic variation and their discovery broadens our understanding of genetic disorders and evolutionary patterns. Although the number of de novo single nucleotide variants (dnSNVs) has been studied in a number of species, relatively little is known about the occurrence of de novo structural variants (dnSVs). In this study, we investigated 37 deeply sequenced pig trios from two commercial lines to identify dnSVs present in the offspring. The identified dnSVs were characterised by identifying their parent of origin, their functional annotations and characterizing sequence homology at the breakpoints. RESULTS We identified four swine germline dnSVs, all located in intronic regions of protein-coding genes. Our conservative, first estimate of the swine germline dnSV rate is 0.108 (95% CI 0.038-0.255) per generation (one dnSV per nine offspring), detected using short-read sequencing. Two detected dnSVs are clusters of mutations. Mutation cluster 1 contains a de novo duplication, a dnSNV and a de novo deletion. Mutation cluster 2 contains a de novo deletion and three de novo duplications, of which one is inverted. Mutation cluster 2 is 25 kb in size, whereas mutation cluster 1 (197 bp) and the other two individual dnSVs (64 and 573 bp) are smaller. Only mutation cluster 2 could be phased and is located on the paternal haplotype. Mutation cluster 2 originates from both micro-homology as well as non-homology mutation mechanisms, where mutation cluster 1 and the other two dnSVs are caused by mutation mechanisms lacking sequence homology. The 64 bp deletion and mutation cluster 1 were validated through PCR. Lastly, the 64 bp deletion and the 573 bp duplication were validated in sequenced offspring of probands with three generations of sequence data. CONCLUSIONS Our estimate of 0.108 dnSVs per generation in the swine germline is conservative, due to our small sample size and restricted possibilities of dnSV detection from short-read sequencing. The current study highlights the complexity of dnSVs and shows the potential of breeding programs for pigs and livestock species in general, to provide a suitable population structure for identification and characterisation of dnSVs.
Collapse
Affiliation(s)
- Marije J Steensma
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands.
| | - Y L Lee
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - A C Bouwman
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - C Pita Barros
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - M F L Derks
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
- Topigs Norsvin Research Center, Schoenaker 6, Beuningen, 6641 SZ, the Netherlands
| | - M C A M Bink
- Hendrix Genetics, P.O. Box 114, Boxmeer, 5830 AC, the Netherlands
| | - B Harlizius
- Topigs Norsvin Research Center, Schoenaker 6, Beuningen, 6641 SZ, the Netherlands
| | - A E Huisman
- Hendrix Genetics, P.O. Box 114, Boxmeer, 5830 AC, the Netherlands
| | - R P M A Crooijmans
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - M A M Groenen
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - H A Mulder
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - C M Rochus
- University of Guelph, Centre for Genetic Improvement of Livestock, 50 Stone Rd E, Guelph, O N, N1G 2W1, Canada
| |
Collapse
|
48
|
Abstract
Comparative studies of hominids have long sought to identify mutational events that shaped the evolution of the human nervous system. However, functional genetic differences are outnumbered by millions of nearly neutral mutations, and the developmental mechanisms underlying human nervous system specializations are difficult to model and incompletely understood. Candidate-gene studies have attempted to map select human-specific genetic differences to neurodevelopmental functions, but it remains unclear how to contextualize the relative effects of genes that are investigated independently. Considering these limitations, we discuss scalable approaches for probing the functional contributions of human-specific genetic differences. We propose that a systems-level view will enable a more quantitative and integrative understanding of the genetic, molecular and cellular underpinnings of human nervous system evolution.
Collapse
Affiliation(s)
- Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA. https://twitter.com/@TylerFair_
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
49
|
Arslan A, Fang Z, Wang M, Tan Y, Cheng Z, Chen X, Guan Y, J. Pisani L, Yoo B, Bejerano G, Peltz G. Analysis of structural variation among inbred mouse strains. BMC Genomics 2023; 24:97. [PMID: 36864393 PMCID: PMC9983223 DOI: 10.1186/s12864-023-09197-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 02/17/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND 'Long read' sequencing methods have been used to identify previously uncharacterized structural variants that cause human genetic diseases. Therefore, we investigated whether long read sequencing could facilitate genetic analysis of murine models for human diseases. RESULTS The genomes of six inbred strains (BTBR T + Itpr3tf/J, 129Sv1/J, C57BL/6/J, Balb/c/J, A/J, SJL/J) were analyzed using long read sequencing. Our results revealed that (i) Structural variants are very abundant within the genome of inbred strains (4.8 per gene) and (ii) that we cannot accurately infer whether structural variants are present using conventional short read genomic sequence data, even when nearby SNP alleles are known. The advantage of having a more complete map was demonstrated by analyzing the genomic sequence of BTBR mice. Based upon this analysis, knockin mice were generated and used to characterize a BTBR-unique 8-bp deletion within Draxin that contributes to the BTBR neuroanatomic abnormalities, which resemble human autism spectrum disorder. CONCLUSION A more complete map of the pattern of genetic variation among inbred strains, which is produced by long read genomic sequencing of the genomes of additional inbred strains, could facilitate genetic discovery when murine models of human diseases are analyzed.
Collapse
Affiliation(s)
- Ahmed Arslan
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Zhuoqing Fang
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Meiyue Wang
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Yalun Tan
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Zhuanfen Cheng
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Xinyu Chen
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | - Yuan Guan
- grid.168010.e0000000419368956Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305 Stanford, CA USA
| | | | - Boyoung Yoo
- Dept. of Computer Science, Stanford School of Engineering, 94305 Stanford, CA USA
| | - Gill Bejerano
- Dept. of Computer Science, Stanford School of Engineering, 94305 Stanford, CA USA ,grid.168010.e0000000419368956Developmental Biology, Biomedical Data Science, Stanford School of Medicine, 94305 Stanford, CA USA
| | - Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, 94305, Stanford, CA, USA.
| |
Collapse
|
50
|
Mérot C, Stenløkk KSR, Venney C, Laporte M, Moser M, Normandeau E, Árnyasi M, Kent M, Rougeux C, Flynn JM, Lien S, Bernatchez L. Genome assembly, structural variants, and genetic differentiation between lake whitefish young species pairs (Coregonus sp.) with long and short reads. Mol Ecol 2023; 32:1458-1477. [PMID: 35416336 DOI: 10.1111/mec.16468] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/24/2022] [Accepted: 04/01/2022] [Indexed: 11/26/2022]
Abstract
Nascent pairs of ecologically differentiated species offer an opportunity to get a better glimpse at the genetic architecture of speciation. Of particular interest is our recent ability to consider a wider range of genomic variants, not only single-nucleotide polymorphisms (SNPs), thanks to long-read sequencing technology. We can now identify structural variants (SVs) such as insertions, deletions and other rearrangements, allowing further insights into the genetic architecture of speciation and how different types of variants are involved in species differentiation. Here, we investigated genomic patterns of differentiation between sympatric species pairs (Dwarf and Normal) belonging to the lake whitefish (Coregonus clupeaformis) species complex. We assembled the first reference genomes for both C. clupeaformis sp. Normal and C. clupeaformis sp. Dwarf, annotated the transposable elements and analysed the genomes in the light of related coregonid species. Next, we used a combination of long- and short-read sequencing to characterize SVs and genotype them at the population scale using genome-graph approaches, showing that SVs cover five times more of the genome than SNPs. We then integrated both SNPs and SVs to investigate the genetic architecture of species differentiation in two different lakes and highlighted an excess of shared outliers of differentiation. In particular, a large fraction of SVs differentiating the two species correspond to insertions or deletions of transposable elements (TEs), suggesting that TE accumulation may represent a key component of genetic divergence between the Dwarf and Normal species. Together, our results suggest that SVs may play an important role in speciation and that, by combining second- and third-generation sequencing, we now have the ability to integrate SVs into speciation genomics.
Collapse
Affiliation(s)
- Claire Mérot
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada.,UMR 6553 Ecobio, OSUR, CNRS, Université de Rennes, Rennes, France
| | - Kristina S R Stenløkk
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Clare Venney
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| | - Martin Laporte
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada.,Ministère des Forêts, de la Faune et des Parcs (MFFP) du Québec, Québec, Québec, Canada
| | - Michel Moser
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Eric Normandeau
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| | - Mariann Árnyasi
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Matthew Kent
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Clément Rougeux
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| | - Jullien M Flynn
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
| | - Sigbjørn Lien
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Louis Bernatchez
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| |
Collapse
|