1
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
Affiliation(s)
- Tuan V. Nguyen
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Christy J. Vander Jagt
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Jianghui Wang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Ruidong Xiang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Michael E. Goddard
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Loan T. Nguyen
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Elizabeth M. Ross
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Ben J. Hayes
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| |
Collapse
|
2
|
Braga LG, Chud TCS, Watanabe RN, Savegnago RP, Sena TM, do Carmo AS, Machado MA, Panetto JCDC, da Silva MVGB, Munari DP. Identification of copy number variations in the genome of Dairy Gir cattle. PLoS One 2023; 18:e0284085. [PMID: 37036840 PMCID: PMC10085049 DOI: 10.1371/journal.pone.0284085] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 03/23/2023] [Indexed: 04/11/2023] Open
Abstract
Studying structural variants that can control complex traits is relevant for dairy cattle production, especially for animals that are tolerant to breeding conditions in the tropics, such as the Dairy Gir cattle. This study identified and characterized high confidence copy number variation regions (CNVR) in the Gir breed genome. A total of 38 animals were whole-genome sequenced, and 566 individuals were genotyped with a high-density SNP panel, among which 36 animals had both sequencing and SNP genotyping data available. Two sets of high confidence CNVR were established: one based on common CNV identified in the studied population (CNVR_POP), and another with CNV identified in sires with both sequence and SNP genotyping data available (CNVR_ANI). We found 10 CNVR_POP and 45 CNVR_ANI, which covered 1.05 Mb and 4.4 Mb of the bovine genome, respectively. Merging these CNV sets for functional analysis resulted in 48 unique high confidence CNVR. The overlapping genes were previously related to embryonic mortality, environmental adaptation, evolutionary process, immune response, longevity, mammary gland, resistance to gastrointestinal parasites, and stimuli recognition, among others. Our results contribute to a better understanding of the Gir breed genome. Moreover, the CNV identified in this study can potentially affect genes related to complex traits, such as production, health, and reproduction.
Collapse
Affiliation(s)
- Larissa G Braga
- Departamento de Engenharia e Ciências Exatas, Universidade Estadual Paulista, Jaboticabal, São Paulo, Brazil
| | - Tatiane C S Chud
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Ontario, Canada
| | - Rafael N Watanabe
- Departamento de Engenharia e Ciências Exatas, Universidade Estadual Paulista, Jaboticabal, São Paulo, Brazil
| | - Rodrigo P Savegnago
- Department of Animal Science, Michigan State University, East Lansing, Michigan, United States of America
| | - Thomaz M Sena
- Departamento de Engenharia e Ciências Exatas, Universidade Estadual Paulista, Jaboticabal, São Paulo, Brazil
| | - Adriana S do Carmo
- Departamento de Zootecnia, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
| | | | | | | | - Danísio P Munari
- Departamento de Engenharia e Ciências Exatas, Universidade Estadual Paulista, Jaboticabal, São Paulo, Brazil
| |
Collapse
|
3
|
Lesack K, Mariene GM, Andersen EC, Wasmuth JD. Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans. PLoS One 2022; 17:e0278424. [PMID: 36584177 PMCID: PMC9803319 DOI: 10.1371/journal.pone.0278424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 01/01/2023] Open
Abstract
The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
Collapse
Affiliation(s)
- Kyle Lesack
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Grace M. Mariene
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Erik C. Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, United States of America
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
- * E-mail:
| |
Collapse
|
4
|
Talenti A, Powell J, Wragg D, Chepkwony M, Fisch A, Ferreira BR, Mercadante MEZ, Santos IM, Ezeasor CK, Obishakin ET, Muhanguzi D, Amanyire W, Silwamba I, Muma JB, Mainda G, Kelly RF, Toye P, Connelley T, Prendergast J. Optical mapping compendium of structural variants across global cattle breeds. Sci Data 2022; 9:618. [PMID: 36229544 PMCID: PMC9561109 DOI: 10.1038/s41597-022-01684-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 09/04/2022] [Indexed: 11/30/2022] Open
Abstract
Structural variants (SV) have been linked to important bovine disease phenotypes, but due to the difficulty of their accurate detection with standard sequencing approaches, their role in shaping important traits across cattle breeds is largely unexplored. Optical mapping is an alternative approach for mapping SVs that has been shown to have higher sensitivity than DNA sequencing approaches. The aim of this project was to use optical mapping to develop a high-quality database of structural variation across cattle breeds from different geographical regions, to enable further study of SVs in cattle. To do this we generated 100X Bionano optical mapping data for 18 cattle of nine different ancestries, three continents and both cattle sub-species. In total we identified 13,457 SVs, of which 1,200 putatively overlap coding regions. This resource provides a high-quality set of optical mapping-based SV calls that can be used across studies, from validating DNA sequencing-based SV calls to prioritising candidate functional variants in genetic association studies and expanding our understanding of the role of SVs in cattle evolution. Measurement(s) | Optical Mapping | Technology Type(s) | Optical Mapping | Factor Type(s) | Structural variants | Sample Characteristic - Organism | Bos taurus | Sample Characteristic - Location | United Kingdom • Kenya • Zambia • Uganda • Brazil • Nigeria |
Collapse
Affiliation(s)
- A Talenti
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.
| | - J Powell
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom
| | - D Wragg
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.,Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, UK
| | - M Chepkwony
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya.,Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - A Fisch
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - B R Ferreira
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - M E Z Mercadante
- Institute of Animal Science, Agriculture Department of São Paulo Government, Sertãozinho, SP, 14.174-000, Brazil
| | - I M Santos
- Ribeirão Preto School of Medicine, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil
| | - C K Ezeasor
- Department of Veterinary Pathology and Microbiology, University of Nigeria, Nsukka, Enugu State, Nigeria
| | - E T Obishakin
- Biotechnology Division, National Veterinary Research Institute, Vom, Plateau State, Nigeria.,Biomedical Research Centre, Ghent University Global Campus, Songdo, Incheon, South Korea
| | - D Muhanguzi
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - W Amanyire
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - I Silwamba
- Department of Disease Control, School of Veterinary Medicine, University of Zambia, P.O BOX 32379, Lusaka, Zambia.,Department of Laboratory and Diagnostics, Livestock Services Cooperative Society, P.O. BOX 32025, Lusaka, Zambia
| | - J B Muma
- Department of Disease Control, School of Veterinary Medicine, University of Zambia, P.O BOX 32379, Lusaka, Zambia
| | - G Mainda
- Department of Veterinary Services, Ministry of Fisheries and Livestock, Central Veterinary Research Institute, P.O. Box 33980, Lusaka, Zambia
| | - R F Kelly
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.,Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, UK
| | - P Toye
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
| | - T Connelley
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom. .,Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK.
| | - J Prendergast
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom. .,Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK.
| |
Collapse
|
5
|
Gao Y, Ma L, Liu GE. Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods. Genes (Basel) 2022; 13:genes13050828. [PMID: 35627213 PMCID: PMC9142105 DOI: 10.3390/genes13050828] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/01/2022] [Accepted: 05/04/2022] [Indexed: 02/01/2023] Open
Abstract
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.
Collapse
Affiliation(s)
- Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Correspondence: ; Tel.: +1-301-504-9843
| |
Collapse
|
6
|
Zhou J, Liu L, Reynolds E, Huang X, Garrick D, Shi Y. Discovering Copy Number Variation in Dual-Purpose XinJiang Brown Cattle. Front Genet 2022; 12:747431. [PMID: 35222511 PMCID: PMC8873982 DOI: 10.3389/fgene.2021.747431] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/01/2021] [Indexed: 12/02/2022] Open
Abstract
Copy number variants (CNVs), which are a class of structural variant, can be important in relating genomic variation to phenotype. The primary aims of this study were to discover the common CNV regions (CNVRs) in the dual-purpose XinJiang-Brown cattle population and to detect differences between CNVs inferred using the ARS-UCD 1.2 (ARS) or the UMD 3.1 (UMD) genome assemblies based on the 150K SNP (Single Nucleotide Polymorphisms) Chip. PennCNV and CNVPartition methods were applied to calculate the deviation of the standardized signal intensity of SNPs markers to detect CNV status. Following the discovery of CNVs, we used the R package HandyCNV to generate and visualize CNVRs, compare CNVs and CNVRs between genome assemblies, and identify consensus genes using annotation resources. We identified 38 consensus CNVRs using the ARS assembly with 1.95% whole genome coverage, and 33 consensus CNVRs using the UMD assembly with 1.46% whole genome coverage using PennCNV and CNVPartition. We identified 37 genes that intersected 13 common CNVs (>5% frequency), these included functionally interesting genes such as GBP4 for which an increased copy number has been negatively associated with cattle stature, and the BoLA gene family which has been linked to the immune response and adaption of cattle. The ARS map file of the GGP Bovine 150K Bead Chip maps the genomic position of more SNPs with increased accuracy compared to the UMD map file. Comparison of the CNVRs identified between the two reference assemblies suggests the newly released ARS reference assembly is better for CNV detection. In spite of this, different CNV detection methods can complement each other to generate a larger number of CNVRs than using a single approach and can highlight more genes of interest.
Collapse
Affiliation(s)
- Jinghang Zhou
- School of Agriculture, Ningxia University, Yinchuan, China
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, New Zealand
| | - Liyuan Liu
- School of Agriculture, Ningxia University, Yinchuan, China
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, New Zealand
| | - Edwardo Reynolds
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, New Zealand
| | - Xixia Huang
- College of Animal Science, Xinjiang Agricultural University, Urumqi, China
| | - Dorian Garrick
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, New Zealand
- *Correspondence: Yuangang Shi, ; Dorian Garrick, mailto:
| | - Yuangang Shi
- School of Agriculture, Ningxia University, Yinchuan, China
- *Correspondence: Yuangang Shi, ; Dorian Garrick, mailto:
| |
Collapse
|
7
|
Chen H, Xue J, Zhang Z, Zhang G, Xu X, Li H, Zhang R, Ullah N, Chen L, Amanullah, Zang Z, Lai S, He X, Li W, Guan M, Li J, Chen L, Deng C. High-speed rail model reveals the gene tandem amplification mediated by short repeated sequence in eukaryote. Sci Rep 2022; 12:2289. [PMID: 35145182 PMCID: PMC8831618 DOI: 10.1038/s41598-022-06250-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 01/24/2022] [Indexed: 02/08/2023] Open
Abstract
The occurrence of gene duplication/amplification (GDA) provide potential material for adaptive evolution with environmental stress. Several molecular models have been proposed to explain GDA, recombination via short stretches of sequence similarity plays a crucial role. By screening genomes for such events, we propose a “SRS (short repeated sequence) *N + unit + SRS*N” amplified unit under USCE (unequal sister-chromatid exchange) for tandem amplification mediated by SRS with different repeat numbers in eukaryotes. The amplified units identified from 2131 well-organized amplification events that generate multi gene/element copy amplified with subsequent adaptive evolution in the respective species. Genomic data we analyzed showed dynamic changes among related species or subspecies or plants from different ecotypes/strains. This study clarifies the characteristics of variable copy number SRS on both sides of amplified unit under USCE mechanism, to explain well-organized gene tandem amplification under environmental stress mediated by SRS in all eukaryotes.
Collapse
Affiliation(s)
- Haidi Chen
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Jingwen Xue
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Zhenghou Zhang
- The Fourth Affiliated Hospital of China Medical University, Shenyang, 110032, China
| | - Geyu Zhang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Xinyuan Xu
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - He Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Ruxue Zhang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Najeeb Ullah
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Lvxing Chen
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Amanullah
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Zhuqing Zang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Shanshan Lai
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China
| | - Ximiao He
- Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.,Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.,Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Wei Li
- Department of Dermatovenereology, Institutes for Systems Genetics, Rare Disease Center, West China Hospital, Sichuan University, No. 37 Guo Xue Xiang Street, Chengdu, 610041, Sichuan, China
| | - Miao Guan
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China.
| | - Jingyi Li
- M.D. Department of Dermatology and Venereology, West China Hospital of Sichuan University, No. 37 Guo Xue Lane, Chengdu, 610041, China.
| | - Liangbiao Chen
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Institute of Experimental Pathology, Shanghai Ocean University, Shanghai, 201306, China.
| | - Cheng Deng
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, 1 Wenyuan Rd., Nanjing, 210023, China.
| |
Collapse
|
8
|
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data. BMC Genomics 2021; 22:826. [PMID: 34789167 PMCID: PMC8596897 DOI: 10.1186/s12864-021-08082-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
Collapse
|
9
|
Chen L, Pryce JE, Hayes BJ, Daetwyler HD. Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11020541. [PMID: 33669735 PMCID: PMC7922624 DOI: 10.3390/ani11020541] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/09/2021] [Accepted: 02/12/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary Structural variants are large changes to the DNA sequences that differ from individual to individual. We discovered and quality-controlled a set of 24,908 structural variants and used a technique called imputation to infer them into 35,588 Holstein and Jersey cattle. We then investigated whether the structural variants affected key dairy cattle traits such as milk production, fertility and overall conformation. Structural variants explained generally less than 10 percent of the phenotypic variation in these traits. Four of the structural variants were significantly associated with dairy cattle production traits. However, the inclusion of the structural variants in the genomic prediction model did not increase genomic prediction accuracy. Abstract Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.
Collapse
Affiliation(s)
- Long Chen
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Jennie E. Pryce
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Ben J. Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Hans D. Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
- Correspondence:
| |
Collapse
|
10
|
Guan D, Martínez A, Castelló A, Landi V, Luigi-Sierra MG, Fernández-Álvarez J, Cabrera B, Delgado JV, Such X, Jordana J, Amills M. A genome-wide analysis of copy number variation in Murciano-Granadina goats. Genet Sel Evol 2020; 52:44. [PMID: 32770942 PMCID: PMC7414533 DOI: 10.1186/s12711-020-00564-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 07/28/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND In this work, our aim was to generate a map of the copy number variations (CNV) segregating in a population of Murciano-Granadina goats, the most important dairy breed in Spain, and to ascertain the main biological functions of the genes that map to copy number variable regions. RESULTS Using a dataset that comprised 1036 Murciano-Granadina goats genotyped with the Goat SNP50 BeadChip, we were able to detect 4617 and 7750 autosomal CNV with the PennCNV and QuantiSNP software, respectively. By applying the EnsembleCNV algorithm, these CNV were assembled into 1461 CNV regions (CNVR), of which 486 (33.3% of the total CNVR count) were consistently called by PennCNV and QuantiSNP and used in subsequent analyses. In this set of 486 CNVR, we identified 78 gain, 353 loss and 55 gain/loss events. The total length of all the CNVR (95.69 Mb) represented 3.9% of the goat autosomal genome (2466.19 Mb), whereas their size ranged from 2.0 kb to 11.1 Mb, with an average size of 196.89 kb. Functional annotation of the genes that overlapped with the CNVR revealed an enrichment of pathways related with olfactory transduction (fold-enrichment = 2.33, q-value = 1.61 × 10-10), ABC transporters (fold-enrichment = 5.27, q-value = 4.27 × 10-04) and bile secretion (fold-enrichment = 3.90, q-value = 5.70 × 10-03). CONCLUSIONS A previous study reported that the average number of CNVR per goat breed was ~ 20 (978 CNVR/50 breeds), which is much smaller than the number we found here (486 CNVR). We attribute this difference to the fact that the previous study included multiple caprine breeds that were represented by small to moderate numbers of individuals. Given the low frequencies of CNV (in our study, the average frequency of CNV is 1.44%), such a design would probably underestimate the levels of the diversity of CNV at the within-breed level. We also observed that functions related with sensory perception, metabolism and embryo development are overrepresented in the set of genes that overlapped with CNV, and that these loci often belong to large multigene families with tens, hundreds or thousands of paralogous members, a feature that could favor the occurrence of duplications or deletions by non-allelic homologous recombination.
Collapse
Affiliation(s)
- Dailu Guan
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Amparo Martínez
- Departamento de Genética, Universidad de Córdoba, 14071, Córdoba, Spain
| | - Anna Castelló
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain.,Departament de Ciència Animal i dels Aliments, Facultat de Veterinària, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Vincenzo Landi
- Departamento de Genética, Universidad de Córdoba, 14071, Córdoba, Spain.,Department of Veterinary Medicine, University of Bari "Aldo Moro", SP. 62 per Casamassima km. 3, 70010, Valenzano, BA, Italy
| | - María Gracia Luigi-Sierra
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Javier Fernández-Álvarez
- Asociación Nacional de Criadores de Caprino de Raza Murciano-Granadina (CAPRIGRAN), 18340, Granada, Spain
| | - Betlem Cabrera
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain.,Departament de Ciència Animal i dels Aliments, Facultat de Veterinària, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | | | - Xavier Such
- Group of Research in Ruminants (G2R), Department of Animal and Food Science, Universitat Autònoma de Barcelona (UAB), Bellaterra, Barcelona, Spain
| | - Jordi Jordana
- Departament de Ciència Animal i dels Aliments, Facultat de Veterinària, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Marcel Amills
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain. .,Departament de Ciència Animal i dels Aliments, Facultat de Veterinària, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain.
| |
Collapse
|
11
|
Kommadath A, Grant JR, Krivushin K, Butty AM, Baes CF, Carthy TR, Berry DP, Stothard P. A large interactive visual database of copy number variants discovered in taurine cattle. Gigascience 2020; 8:5523204. [PMID: 31241156 PMCID: PMC6593363 DOI: 10.1093/gigascience/giz073] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 02/27/2019] [Accepted: 05/28/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Copy number variants (CNVs) contribute to genetic diversity and phenotypic variation. We aimed to discover CNVs in taurine cattle using a large collection of whole-genome sequences and to provide an interactive database of the identified CNV regions (CNVRs) that includes visualizations of sequence read alignments, CNV boundaries, and genome annotations. RESULTS CNVs were identified in each of 4 whole-genome sequencing datasets, which together represent >500 bulls from 17 breeds, using a popular multi-sample read-depth-based algorithm, cn.MOPS. Quality control and CNVR construction, performed dataset-wise to avoid batch effects, resulted in 26,223 CNVRs covering 107.75 unique Mb (4.05%) of the bovine genome. Hierarchical clustering of samples by CNVR genotypes indicated clear separation by breeds. An interactive HTML database was created that allows data filtering options, provides graphical and tabular data summaries including Hardy-Weinberg equilibrium tests on genotype proportions, and displays genes and quantitative trait loci at each CNVR. Notably, the database provides sequence read alignments at each CNVR genotype and the boundaries of constituent CNVs in individual samples. Besides numerous novel discoveries, we corroborated the genotypes reported for a CNVR at the KIT locus known to be associated with the piebald coat colour phenotype in Hereford and some Simmental cattle. CONCLUSIONS We present a large comprehensive collection of taurine cattle CNVs in a novel interactive visual database that displays CNV boundaries, read depths, and genome features for individual CNVRs, thus providing users with a powerful means to explore and scrutinize CNVRs of interest more thoroughly.
Collapse
Affiliation(s)
- Arun Kommadath
- Department of Agricultural, Food and Nutritional Science (AFNS), University of Alberta, Edmonton, AB, Canada.,Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, Alberta, Canada
| | - Jason R Grant
- Department of Agricultural, Food and Nutritional Science (AFNS), University of Alberta, Edmonton, AB, Canada
| | - Kirill Krivushin
- Department of Agricultural, Food and Nutritional Science (AFNS), University of Alberta, Edmonton, AB, Canada
| | - Adrien M Butty
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Christine F Baes
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada.,Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Tara R Carthy
- Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Ireland
| | - Donagh P Berry
- Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Ireland
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science (AFNS), University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
12
|
Lauer S, Gresham D. An evolving view of copy number variants. Curr Genet 2019; 65:1287-1295. [PMID: 31076843 DOI: 10.1007/s00294-019-00980-0] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 04/17/2019] [Accepted: 04/20/2019] [Indexed: 01/08/2023]
Abstract
Copy number variants (CNVs) are regions of the genome that vary in integer copy number. CNVs, which comprise both amplifications and deletions of DNA sequence, have been identified across all domains of life, from bacteria and archaea to plants and animals. CNVs are an important source of genetic diversity, and can drive rapid adaptive evolution and progression of heritable and somatic human diseases, such as cancer. However, despite their evolutionary importance and clinical relevance, CNVs remain understudied compared to single-nucleotide variants (SNVs). This is a consequence of the inherent difficulties in detecting CNVs at low-to-intermediate frequencies in heterogeneous populations of cells. Here, we discuss molecular methods used to detect CNVs, the limitations associated with using these techniques, and the application of new and emerging technologies that present solutions to these challenges. The goal of this short review and perspective is to highlight aspects of CNV biology that are understudied and define avenues for further research that address specific gaps in our knowledge of these complex alleles. We describe our recently developed method for CNV detection in which a fluorescent gene functions as a single-cell CNV reporter and present key findings from our evolution experiments in Saccharomyces cerevisiae. Using a CNV reporter, we found that CNVs are generated at a high rate and undergo selection with predictable dynamics across independently evolving replicate populations. Many CNVs appear to be generated through DNA replication-based processes that are mediated by the presence of short, interrupted, inverted-repeat sequences. Our results have important implications for the role of CNVs in evolutionary processes and the molecular mechanisms that underlie CNV formation. We discuss the possible extension of our method to other applications, including tracking the dynamics of CNVs in models of human tumors.
Collapse
Affiliation(s)
- Stephanie Lauer
- Institute for Systems Genetics, New York University Langone Health, New York, NY, USA
| | - David Gresham
- Center for Genomics and System Biology, Department of Biology, New York University, New York, NY, USA.
| |
Collapse
|
13
|
Lopdell TJ, Tiplady K, Couldrey C, Johnson TJJ, Keehan M, Davis SR, Harris BL, Spelman RJ, Snell RG, Littlejohn MD. Multiple QTL underlie milk phenotypes at the CSF2RB locus. Genet Sel Evol 2019; 51:3. [PMID: 30678637 PMCID: PMC6346582 DOI: 10.1186/s12711-019-0446-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/10/2019] [Indexed: 12/30/2022] Open
Abstract
Background Over many years, artificial selection has substantially improved milk production by cows. However, the genes that underlie milk production quantitative trait loci (QTL) remain relatively poorly characterised. Here, we investigate a previously reported QTL located at the CSF2RB locus on chromosome 5, for several milk production phenotypes, to better understand its underlying genetic and molecular causes. Results Using a population of 29,350 taurine dairy cows, we conducted association analyses for milk yield and composition traits, and identified highly significant QTL for milk yield, milk fat concentration, and milk protein concentration. Strikingly, protein concentration and milk yield appear to show co-located yet genetically distinct QTL. To attempt to understand the molecular mechanisms that might be mediating these effects, gene expression data were used to investigate eQTL for 11 genes in the broader interval. This analysis highlighted genetic impacts on CSF2RB and NCF4 expression that share similar association signatures to those observed for lactation QTL, strongly implicating one or both of these genes as responsible for these effects. Using the same gene expression dataset representing 357 lactating cows, we also identified 38 novel RNA editing sites in the 3′ UTR of CSF2RB transcripts. The extent to which two of these sites were edited also appears to be genetically co-regulated with lactation QTL, highlighting a further layer of regulatory complexity that involves the CSF2RB gene. Conclusions This locus presents a diversity of molecular and lactation QTL, likely representing multiple overlapping effects that, at a minimum, highlight the CSF2RB gene as having a causal role in these processes. Electronic supplementary material The online version of this article (10.1186/s12711-019-0446-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas J Lopdell
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand. .,School of Biological Sciences, University of Auckland, Symonds Street, Auckland, New Zealand.
| | - Kathryn Tiplady
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Christine Couldrey
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Thomas J J Johnson
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Michael Keehan
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Stephen R Davis
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Bevin L Harris
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Richard J Spelman
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| | - Russell G Snell
- School of Biological Sciences, University of Auckland, Symonds Street, Auckland, New Zealand
| | - Mathew D Littlejohn
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Hamilton, New Zealand
| |
Collapse
|