1
|
He Y, Zhang X, Peng MS, Li YC, Liu K, Zhang Y, Mao L, Guo Y, Ma Y, Zhou B, Zheng W, Yue T, Liao Y, Liang SA, Chen L, Zhang W, Chen X, Tang B, Yang X, Ye K, Gao S, Lu Y, Wang Y, Wan S, Hao R, Wang X, Mao Y, Dai S, Gao Z, Yang LQ, Guo J, Li J, Liu C, Wang J, Sovannary T, Bunnath L, Kampuansai J, Inta A, Srikummool M, Kutanan W, Ho HQ, Pham KD, Singthong S, Sochampa S, Kyaing UW, Pongamornkul W, Morlaeku C, Rattanakrajangsri K, Kong QP, Zhang YP, Su B. Genome diversity and signatures of natural selection in mainland Southeast Asia. Nature 2025:10.1038/s41586-025-08998-w. [PMID: 40369069 DOI: 10.1038/s41586-025-08998-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 04/09/2025] [Indexed: 05/16/2025]
Abstract
Mainland Southeast Asia (MSEA) has rich ethnic and cultural diversity with a population of nearly 300 million1,2. However, people from MSEA are underrepresented in the current human genomic databases. Here we present the SEA3K genome dataset (phase I), generated by deep short-read whole-genome sequencing of 3,023 individuals from 30 MSEA populations, and long-read whole-genome sequencing of 37 representative individuals. We identified 79.59 million small variants and 96,384 structural variants, among which 22.83 million small variants and 24,622 structural variants are unique to this dataset. We observed a high genetic heterogeneity across MSEA populations, reflected by the varied combinations of genetic components. We identified 44 genomic regions with strong signatures of Darwinian positive selection, covering 89 genes involved in varied physiological systems such as physical traits and immune response. Furthermore, we observed varied patterns of archaic Denisovan introgression in MSEA populations, supporting the proposal of at least two distinct instances of Denisovan admixture into modern humans in Asia3. We also detected genomic regions that suggest adaptive archaic introgressions in MSEA populations. The large number of novel genomic variants in MSEA populations highlight the necessity of studying regional populations that can help answer key questions related to prehistory, genetic adaptation and complex diseases.
Collapse
Affiliation(s)
- Yaoxi He
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Yunnan Key Laboratory of Integrative Anthropology, Kunming, China
| | - Xiaoming Zhang
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Yunnan Key Laboratory of Integrative Anthropology, Kunming, China
| | - Min-Sheng Peng
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu-Chun Li
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
- Kunming Key Laboratory of Healthy Aging Study, Kunming, China
| | - Kai Liu
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu Zhang
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Leyan Mao
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yongbo Guo
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Yujie Ma
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Bin Zhou
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wangshan Zheng
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tian Yue
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yuwen Liao
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shen-Ao Liang
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, China
| | - Lu Chen
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, China
| | - Weijie Zhang
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoning Chen
- National Genomics Data Center, China National Center for Bioinformation, Beijing, China
| | - Bixia Tang
- National Genomics Data Center, China National Center for Bioinformation, Beijing, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Center for Mathematical Medical, the First Affiliated Hospital, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Center for Mathematical Medical, the First Affiliated Hospital, Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Genome Institute, the First Affiliated Hospital, Xi'an Jiaotong University, Xi'an, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
- Faculty of Science, Leiden University, Leiden, The Netherlands
| | - Shenghan Gao
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yurun Lu
- CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Yong Wang
- CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Shijie Wan
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Rushan Hao
- School of Medicine, Yunnan University, Kunming, China
| | - Xuankai Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University, Yiwu, China
| | - Shanshan Dai
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Zongliang Gao
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
- Kunming Key Laboratory of Healthy Aging Study, Kunming, China
| | - Li-Qin Yang
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Yunnan Key Laboratory of Integrative Anthropology, Kunming, China
- Kunming Key Laboratory of Healthy Aging Study, Kunming, China
| | - Jianxin Guo
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Jiangguo Li
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- Laboratory Animal Center, Kunming Institute of Zoology, the Chinese Academy of Sciences, Kunming, China
- National Resource Center for Non-Human Primates, Kunming, China
| | - Jianhua Wang
- Department of Anthropology, School of Sociology, Yunnan Minzu University, Kunming, China
| | - Tuot Sovannary
- Department of Geography and Land Management, Royal University of Phnom Penh, Phnom Penh, Cambodia
| | - Long Bunnath
- Department of Geography and Land Management, Royal University of Phnom Penh, Phnom Penh, Cambodia
| | - Jatupol Kampuansai
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
| | - Angkhana Inta
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
| | - Metawee Srikummool
- Department of Biochemistry, Faculty of Medical Science, Naresuan University, Phitsanulok, Thailand
| | - Wibhu Kutanan
- Department of Biology, Faculty of Science, Naresuan University, Phitsanulok, Thailand
| | - Huy Quang Ho
- Department of Immunology, Ha Noi Medical University, Ha Noi, Vietnam
| | - Khoa Dang Pham
- Department of Immunology, Ha Noi Medical University, Ha Noi, Vietnam
| | | | | | - U Win Kyaing
- Field School of Archaeology, Paukkhaung, Myanmar
| | - Wittaya Pongamornkul
- Queen Sirikit Botanic Garden (QSBG), The Botanical Garden Organization, Chiang Mai, Thailand
| | - Chutima Morlaeku
- Inter Mountain Peoples Education and Culture in Thailand Association (IMPECT), Sansai, Thailand
| | | | - Qing-Peng Kong
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China.
- Kunming Key Laboratory of Healthy Aging Study, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China.
- University of Chinese Academy of Sciences, Beijing, China.
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, China.
| | - Bing Su
- State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Yunnan Key Laboratory of Integrative Anthropology, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
2
|
Venkatesan RT, Rani A, Umesh S, Sushil K, Kumar DA, Sowmya P, Kesavan M, Singh RB, Panchasara HH, Kumar SA, Chhaya R. Genome-wide scan for SNPs and selective sweeps reveals candidate genes and QTLs for milk production and reproduction traits in Indian Kankrej cattle. 3 Biotech 2025; 15:90. [PMID: 40092452 PMCID: PMC11909306 DOI: 10.1007/s13205-025-04263-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 03/04/2025] [Indexed: 03/19/2025] Open
Abstract
Genome-wide identification and annotation of SNPs and selective sweeps was done in Kankrej cattle using the ddRAD sequencing method. Identified 1,983,581 SNPs and nearly half (48.81%) of the effects were found in intron region. Around 624 SNPs annotated in 215 candidate genes were associated with various milk production and reproduction traits. The degree of heterozygosity as 0.2907 against expected heterozygosity of 0.3216. Identified 300 candidate selective sweeps and functional profiling of genes in selective sweep regions resulted with 20 significant (adj p < 0.05) functions. Functional annotation revealed 53.2% of QTLs for milk association while 15.33% for production association, 10.68% for reproduction association, and 8.4% for exterior association. The functional enrichment analysis revealed the presence of significant QTLs in 14 chromosomes. The QTL for milk protein percentage was identified as the top most significant milk type along with the milk potassium content, milk casein percentage, milk yield, milk fat yield, etc. The interval to first estrus after calving, age at puberty, calving interval, conception rate, and birth index were some of the significant QTLs identified for reproduction traits. Genes related to keratinization indicated the selection signature in relation to environmental stressors contributing to adaptation of animals to tropical climatic condition. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-025-04263-z.
Collapse
Affiliation(s)
| | - Alex Rani
- ICAR-National Dairy Research Institute, Karnal, Haryana India
| | - Singh Umesh
- Bihar Animal Sciences University, Patna, Bihar India
| | - Kumar Sushil
- ICAR-Central Institute for Research on Cattle, Meerut, Uttar Pradesh India
| | - Das Achintya Kumar
- ICAR-Central Institute for Research on Cattle, Meerut, Uttar Pradesh India
| | - Pulapet Sowmya
- Oneomics Private Limited, Bharathidasan University Technology Park, Khajamalai Campus, Tiruchirappalli, Tamil Nadu India
| | - Markkandan Kesavan
- Oneomics Private Limited, Bharathidasan University Technology Park, Khajamalai Campus, Tiruchirappalli, Tamil Nadu India
| | | | - H. H. Panchasara
- Livestock Research Station, Kamdhenu University, Dantiwada, Gujarat India
| | - Singh Amit Kumar
- ICAR-Central Institute for Research on Cattle, Meerut, Uttar Pradesh India
| | - Rani Chhaya
- ICAR-Central Institute for Research on Cattle, Meerut, Uttar Pradesh India
| |
Collapse
|
3
|
Freund FD, Gates D, Johnson MG, Rothfels CJ. Phylogenetics and population structure of the western North American endemic Pacific Laurasian clade of Isoëtes. AMERICAN JOURNAL OF BOTANY 2025; 112:e70030. [PMID: 40237371 DOI: 10.1002/ajb2.70030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 01/24/2025] [Accepted: 01/27/2025] [Indexed: 04/18/2025]
Abstract
PREMISE Isoëtes is a genus of small, semi-woody, hydrophilic, heterosporous lycophytes with a cosmopolitan, global distribution. However, local populations tend to be found in narrow, patchy, and highly fragmented mesic to aquatic habitats, many of which are currently under threat. In this study, we sought to uncover how this patchy distribution has affected the evolutionary history of one of the two lineages of Isoëtes found on the West Coast of North America-the Pacific Laurasian clade (PLC). METHODS We used a combination of population genetic and multilocus molecular phylogenetic approaches to infer the relationships among the three described species in this clade and to determine the degree of isolation among the sampled populations. RESULTS We discovered that the populations studied are highly structured and that two of the species, as currently circumscribed, are not monophyletic. Instead, our phylogenetic results suggest that there are at least eight distinct "species-level" clades within the PLC. Of these eight, five appear to have been the result of a rapid radiation. CONCLUSIONS Our results suggest that the existing taxonomy does not reflect the actual diversity in the PLC and warrants further investigation.
Collapse
Affiliation(s)
- Forrest D Freund
- Department of Integrative Biology, University of California, Berkeley, 1001 Valley Life Sciences Building, Berkeley, 94720-2465, CA, USA
| | - Daniel Gates
- Department of Evolution and Ecology and Center for Population Biology, University of California, Davis, Davis, 97616, CA, USA
| | - Matthew G Johnson
- Department of Biological Sciences, Texas Tech University, 2901 Main Street, Lubbock, 79409-3131, TX, USA
| | - Carl J Rothfels
- Ecology Center and Department of Biology, Utah State University, BNR 117, 5305 Old Main Hill, Logan, 84322-0300, UT, USA
| |
Collapse
|
4
|
Zhang F, Liu X, Xia H, Wu H, Zong Y, Li H. Identification of genetic loci for growth and stem form traits in hybrid Liriodendron via a genome-wide association study. FORESTRY RESEARCH 2025; 5:e001. [PMID: 40028428 PMCID: PMC11870303 DOI: 10.48130/forres-0025-0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 11/11/2024] [Accepted: 01/03/2025] [Indexed: 03/05/2025]
Abstract
A key objective of forest tree breeding programs is to enhance traits related to growth and stem form, to cultivate plantations that exhibit rapid growth, straight trunks with minimal taper, and superior wood quality to meet the demands of modern timber production. Notably, Liriodendron species exhibit notable heterosis in interspecies hybrids, with hybrid Liriodendron displaying rapid growth rates, straight trunks, and wide adaptability. However, the genetic architecture underlying growth and stem form traits remains unclear, hindering the progress of genetic improvement efforts. Genome-wide association study (GWAS) emerges as an effective approach for identifying target genes and clarifying genetic architectures. In this study, a comprehensive analysis was conducted using an artificial population of 233 hybrid progeny derived from 25 hybrid combinations and resequenced to obtain genome-wide single nucleotide polymorphism (SNP) and insertion and deletion (InDel) variants. After filtering, a total of 192,972 SNP loci and 60,666 InDel loci were obtained, which were subsequently analyzed for associations using the R package GAPIT. We identified 97 significant SNP loci and 58 significant InDel loci (-Log10(P) ≥ 4.50), respectively, culminating in the identification of 161 candidate genes. The functions of these candidate genes were annotated, revealing potential associations between Lchi_2g03172 and Lchi_10g19986 genes with the growth of hybrid Liriodendron, and highlighting the potential influence of the Lchi_16g30522 gene on the growth and branching of hybrid Liriodendron. Overall, this study serves as a foundational step towards unraveling the genetic architecture underpinning growth and stem form in Liriodendron plants.
Collapse
Affiliation(s)
- Fengchao Zhang
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Xiao Liu
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Hui Xia
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Hainan Wu
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Yaxian Zong
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Huogen Li
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| |
Collapse
|
5
|
Gandotra N, Tyagi A, Tikhonova I, Storer C, Scharfe C. CFTR haplotype phasing using long-read genome sequencing from ultralow input DNA. GENETICS IN MEDICINE OPEN 2025; 3:101962. [PMID: 40027236 PMCID: PMC11869909 DOI: 10.1016/j.gimo.2025.101962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 01/02/2025] [Accepted: 01/03/2025] [Indexed: 03/05/2025]
Abstract
Purpose Newborn screening identifies rare diseases that result from the recessive inheritance of pathogenic variants in both copies of a gene. Long-read genome sequencing (LRS) is used for identifying and phasing genomic variants, but further efforts are needed to develop LRS for applications using low-yield DNA samples. Methods In this study, genomic DNA with high molecular weight was obtained from 2 cystic fibrosis patients, comprising a whole-blood sample (CF1) and a newborn dried blood spot sample (CF2). Library preparation and genome sequencing (30-fold coverage) were performed using 20 ng of DNA input on both the PacBio Revio system and the Illumina NovaSeq short-read sequencer. Single-nucleotide variants, small indels, and structural variants were identified for each data set. Results Our results indicated that the genotype concordance between long- and short-read genome sequencing data was higher for single-nucleotide variants than for small indels. Both technologies accurately identified known pathogenic variants in the CFTR gene (CF1: p.(Met607_Gln634del), p.(Phe508del); CF2: p.(Phe508del), p.(Ala455Glu)) with complete concordance for the polymorphic poly-TG and consecutive poly-T tracts. Using PacBio read-based haplotype phasing, we successfully determined the allelic phase and identified compound heterozygosity of pathogenic variants at genomic distances of 32.4 kb (CF1) and 10.8 kb (CF2). Conclusion Haplotype phasing of rare pathogenic variants from minimal DNA input is achieved through LRS. This approach has the potential to eliminate the need for parental testing, thereby shortening the time to diagnosis in genetic disease screening.
Collapse
Affiliation(s)
- Neeru Gandotra
- Department of Genetics, Yale School of Medicine, New Haven, CT
- SUNY Upstate Medical University, Syracuse, NY
| | - Antariksh Tyagi
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - Irina Tikhonova
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | | | - Curt Scharfe
- Department of Genetics, Yale School of Medicine, New Haven, CT
| |
Collapse
|
6
|
Yang F, Lang T, Wu J, Zhang C, Qu H, Pu Z, Yang F, Yu M, Feng J. SNP loci identification and KASP marker development system for genetic diversity, population structure, and fingerprinting in sweetpotato (Ipomoea batatas L.). BMC Genomics 2024; 25:1245. [PMID: 39719557 DOI: 10.1186/s12864-024-11139-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 12/09/2024] [Indexed: 12/26/2024] Open
Abstract
Sweetpotato (Ipomoea batatas L.), an important food and industrial crop in the world, has a highly heterozygous hexaploid genome, making the development of single nucleotide polymorphism (SNP) markers challenging. Identifying SNP loci and developing practical SNP markers are crucial for genomic and genetic research on sweetpotato. A restriction site-associated DNA sequencing analysis of 60 sweetpotato accessions in this study yielded about 7.97 million SNPs. Notably, 954 candidate SNPs were obtained from 21,681 high-quality SNPs. Based on their stability and polymorphism, 274 kompetitive allele specific PCR (KASP) markers were then developed and uniformly distributed on chromosomes. The 274 KASP markers were used to genotype 93 sweetpotato accessions to evaluate their utility for assessing germplasm and analyzing genetic diversity and population structures. These markers had respective mean values of 0.24, 0.34, 0.31, and 0.25 for minor allele frequency, heterozygosity, gene diversity, and polymorphic information content (PIC). Their genetic pedigree led to the division of all accessions into three primary clusters, which were found to be both interrelated and independent. Finally, 74 KASP markers with PIC values greater than 0.35 were selected as core markers. These markers were used to construct the DNA fingerprints of 93 sweetpotato accessions and were able to differentiate between all accessions. To the best of our knowledge, this is the first attempt at the development and application of KASP markers in sweetpotato. However, due to sweetpotato's polyploidy, heterozygosity and the complex genome, the KASP marker conversion rate in this study was relatively low. To improve the KASP marker conversion rate, and accuracies in SNP discovery and marker validation, further studies including more accessions from underrepresented regions are needed in sweetpotato.
Collapse
Affiliation(s)
- Feiyang Yang
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
- School of life science and engineering, Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China
| | - Tao Lang
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Jingyu Wu
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Cong Zhang
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Huijuan Qu
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Zhigang Pu
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Fan Yang
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China
| | - Ma Yu
- School of life science and engineering, Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China
| | - Junyan Feng
- Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 610011, China.
| |
Collapse
|
7
|
Rabbani MAG, Vallejo-Trujillo A, Wu Z, Miedzinska K, Faruque S, Watson KA, Smith J. Whole genome sequencing of three native chicken varieties (Common Deshi, Hilly and Naked Neck) of Bangladesh. Sci Data 2024; 11:1432. [PMID: 39719437 DOI: 10.1038/s41597-024-04291-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/04/2024] [Indexed: 12/26/2024] Open
Abstract
Bangladeshi indigenous chicken varieties - Common Deshi, Hilly and Naked Neck are notable for their egg production, meat quality, extraordinary survivability and disease resistance. However, the potential to harness their unique genetic merits are being eroded by various factors, including crossbreeding. In-depth genomic studies have not been carried out on these breeds so far. To this end, blood samples and associated phenotypic metadata have been collected from local, unimproved birds sampled from 8 different locations across the country, and from Bangladesh Livestock Research Institute (BLRI)-improved chickens of the same mentioned breeds. Whole Genome Sequencing (WGS) of 96 selected samples, representing local and improved populations of each breed, has been carried out. Around 22 M high-quality SNPs have been identified, with 25% of these being novel variants previously undescribed in public databases. This data set will allow for genetic comparison between breeds, and between selected and unimproved birds, providing a resource for genomic selection in Bangladeshi breeding schemes to create more productive and resilient poultry stock.
Collapse
Affiliation(s)
- Md Ataul Goni Rabbani
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
- Poultry Production Research Division, Bangladesh Livestock Research Institute (BLRI), Savar, Dhaka, 1341, Bangladesh.
| | - Adriana Vallejo-Trujillo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Zhou Wu
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Katarzyna Miedzinska
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Shakila Faruque
- Poultry Production Research Division, Bangladesh Livestock Research Institute (BLRI), Savar, Dhaka, 1341, Bangladesh
| | - Kellie A Watson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| |
Collapse
|
8
|
Ballesio F, Pepe G, Ausiello G, Novelletto A, Helmer-Citterich M, Gherardini PF. Human lncRNAs harbor conserved modules embedded in different sequence contexts. Noncoding RNA Res 2024; 9:1257-1270. [PMID: 39040814 PMCID: PMC11261117 DOI: 10.1016/j.ncrna.2024.06.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 06/11/2024] [Accepted: 06/19/2024] [Indexed: 07/24/2024] Open
Abstract
We analyzed the structure of human long non-coding RNA (lncRNAs) genes to investigate whether the non-coding transcriptome is organized in modular domains, as is the case for protein-coding genes. To this aim, we compared all known human lncRNA exons and identified 340 pairs of exons with high sequence and/or secondary structure similarity but embedded in a dissimilar sequence context. We grouped these pairs in 106 clusters based on their reciprocal similarities. These shared modules are highly conserved between humans and the four great ape species, display evidence of purifying selection and likely arose as a result of recent segmental duplications. Our analysis contributes to the understanding of the mechanisms driving the evolution of the non-coding genome and suggests additional strategies towards deciphering the functional complexity of this class of molecules.
Collapse
Affiliation(s)
- Francesco Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Andrea Novelletto
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | | | | |
Collapse
|
9
|
Kamaraj V, Sinha H. SCI-VCF: a cross-platform GUI solution to summarize, compare, inspect and visualize the variant call format. NAR Genom Bioinform 2024; 6:lqae083. [PMID: 38984067 PMCID: PMC11231579 DOI: 10.1093/nargab/lqae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024] Open
Abstract
As genomics advances swiftly and its applications extend to diverse fields, bioinformatics tools must enable researchers and clinicians to work with genomic data irrespective of their programming expertise. We developed SCI-VCF, a Shiny-based comprehensive analysis utility to summarize, compare, inspect, analyse and design interactive visualizations of the genetic variants from the variant call format. With an intuitive graphical user interface, SCI-VCF aims to bridge the approachability gap in genomics that arises from the existing predominantly command-line utilities. SCI-VCF is written in R and is freely available at https://doi.org/10.5281/zenodo.11453080. For installation-free access, users can avail themselves of an online version at https://ibse.shinyapps.io/sci-vcf-online.
Collapse
Affiliation(s)
- Venkatesh Kamaraj
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai 600036, Tamil Nadu, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600036, Tamil Nadu, India
| | - Himanshu Sinha
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai 600036, Tamil Nadu, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600036, Tamil Nadu, India
- Wadhwani School of Data Science and Artificial Intelligence, IIT Madras, Chennai 600036, Tamil Nadu, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai 600036, Tamil Nadu, India
| |
Collapse
|
10
|
Takayama J, Makino S, Funayama T, Ueki M, Narita A, Murakami K, Orui M, Ishikuro M, Obara T, Kuriyama S, Yamamoto M, Tamiya G. A fine-scale genetic map of the Japanese population. Clin Genet 2024; 106:284-292. [PMID: 38719617 DOI: 10.1111/cge.14536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 08/13/2024]
Abstract
Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.
Collapse
Affiliation(s)
- Jun Takayama
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, Sendai, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Satoshi Makino
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Takamitsu Funayama
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Masao Ueki
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Akira Narita
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Keiko Murakami
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
| | - Masatsugu Orui
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Mami Ishikuro
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Taku Obara
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Shinichi Kuriyama
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Masayuki Yamamoto
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Gen Tamiya
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, Sendai, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| |
Collapse
|
11
|
González-Prendes R, Pena RN, Richart C, Nadal J, Ros-Freixedes R. Long-read de novo assembly of the red-legged partridge (Alectoris rufa) genome. Sci Data 2024; 11:908. [PMID: 39191744 PMCID: PMC11349902 DOI: 10.1038/s41597-024-03659-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 07/18/2024] [Indexed: 08/29/2024] Open
Abstract
The red-legged partridge (Alectoris rufa) is a popular game bird species that is in decline in several regions of southwestern Europe. The introduction of farm-reared individuals of a distinct genetic make-up in hunting reserves can result in genetic swamping of wild populations. Here we present a de novo genome assembly for the red-legged partridge based on long-read sequencing technology. The assembled genome size is 1.14 Gb, with scaffold N50 of 37.6 Mb and contig N50 of 29.5 Mb. Our genome is highly contiguous and contains 97.06% of complete avian core genes. Overall, the quality of this genome assembly is equivalent to those available for other close relatives such as the Japanese quail or the chicken. This genome assembly will contribute to the understanding of genetic dynamics of wild populations of red-legged partridges with releases of farm-reared reinforcements and to appropriate management decisions of such populations.
Collapse
Affiliation(s)
- Rayner González-Prendes
- Animal Breeding and Genomics, Wageningen University & Research, 6708PB, Wageningen, The Netherlands
| | - Ramona Natacha Pena
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain
- Agrotecnio-CERCA Center, Lleida, Spain
| | - Cristóbal Richart
- Departament de Medicina i Cirurgia, Universitat Rovira i Virgili, Tarragona, Spain
| | - Jesús Nadal
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain.
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain.
- Agrotecnio-CERCA Center, Lleida, Spain.
| |
Collapse
|
12
|
Tijjani A, Kambal S, Terefe E, Njeru R, Ogugo M, Ndambuki G, Missohou A, Traore A, Salim B, Ezeasor C, D'andre H C, Obishakin ET, Diallo B, Talaki E, Abdoukarim IY, Nash O, Osei-Amponsah R, Ravaorimanana S, Issa Y, Zegeye T, Mukasa C, Tiambo C, Prendergast JGD, Kemp SJ, Han J, Marshall K, Hanotte O. Genomic Reference Resource for African Cattle: Genome Sequences and High-Density Array Variants. Sci Data 2024; 11:801. [PMID: 39030190 PMCID: PMC11271538 DOI: 10.1038/s41597-024-03589-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 07/01/2024] [Indexed: 07/21/2024] Open
Abstract
The diversity in genome resources is fundamental to designing genomic strategies for local breed improvement and utilisation. These resources also support gene discovery and enhance our understanding of the mechanisms of resilience with applications beyond local breeds. Here, we report the genome sequences of 555 cattle (208 of which comprise new data) and high-density (HD) array genotyping of 1,082 samples (537 new samples) from indigenous African cattle populations. The new sequences have an average genome coverage of ~30X, three times higher than the average (~10X) of the over 300 sequences already in the public domain. Following variant quality checks, we identified approximately 32.3 million sequence variants and 661,943 HD autosomal variants mapped to the Bos taurus reference genome (ARS-UCD1.2). The new datasets were generated as part of the Centre for Tropical Livestock Genetics and Health (CTLGH) Genomic Reference Resource for African Cattle (GRRFAC) initiative, which aspires to facilitate the generation of this livestock resource and hopes for its utilisation for complete indigenous breed characterisation and sustainable global livestock improvement.
Collapse
Affiliation(s)
- Abdulfatai Tijjani
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Ethiopia, P.O. Box 5689, Addis Ababa, Ethiopia.
- The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine, 04609, USA.
| | - Sumaya Kambal
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Ethiopia, P.O. Box 5689, Addis Ababa, Ethiopia
- Department of Genetics and Animal Breeding, Faculty of Animal Production, University of Khartoum, Khartoum, Sudan
| | - Endashaw Terefe
- Department of Animal Science, College of Agriculture and Environmental Sciences, Arsi University, Asella, Ethiopia
| | - Regina Njeru
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Moses Ogugo
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Gideon Ndambuki
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Ayao Missohou
- Ecole Inter-Etats des Sciences et Médecine Vétérinaires (EISMV), Dakar, Sénégal
| | - Amadou Traore
- Institut de l'Environnement et de Recherches Agricoles (INERA), Ouagadougou, Burkina Faso
| | - Bashir Salim
- Faculty of Veterinary Medicine, University of Khartoum, Khartoum, Sudan
- Camel Research Center, King Faisal University, Al-Ahsa, Saudi Arabia
| | - Chukwunonso Ezeasor
- Department of Veterinary Pathology and Microbiology, University of Nigeria, Nsukka, Enugu State, Nigeria
| | - Claire D'andre H
- Rwanda Agricultural and Animal Resources Development Board, Kigali, Rwanda
| | - Emmanuel T Obishakin
- Biotechnology Division, National Veterinary Research Institute, Vom, Plateau State, Nigeria
| | | | - Essodina Talaki
- École Supérieure d'Agronomie de l'Université de Lomé, Lomé, Togo
| | - Issaka Y Abdoukarim
- Laboratoire de Biotechnologie Animale et de Technologie des Viandes, Abomey-Calavi, Benin
| | - Oyekanmi Nash
- Centre for Genomics Research and Innovation, NABDA, Abuja, Nigeria
| | - Richard Osei-Amponsah
- Department of Animal Science, College of Basic and Applied Sciences, University of Ghana, Legon, Ghana
| | | | - Youssouf Issa
- Institut National supérieur des Sciences et Techniques d'Abéché-INSTA/Tchad, Abéché, Chad
| | - Tsadkan Zegeye
- Mekelle Agricultural Research Center, Tigray Agricultural Research Institute, Mekelle, Ethiopia
| | - Christopher Mukasa
- National Animal Genetic Resources Centre and Data Bank (NAGRC&DB), Entebbe, Uganda
| | - Christian Tiambo
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - James G D Prendergast
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Stephen J Kemp
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Jianlin Han
- CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
- Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou, Sanya, 572024, Hainan, P. R. China
| | - Karen Marshall
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya.
| | - Olivier Hanotte
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Ethiopia, P.O. Box 5689, Addis Ababa, Ethiopia.
- Cells, Organism and Molecular Genetics, School of Life Sciences, University of Nottingham, Nottingham, UK.
| |
Collapse
|
13
|
Ganguly A, Amin S, Al-Amin, Tasnim Chowdhury F, Khan H, Riazul Islam M. Whole genome resequencing unveils low-temperature stress tolerance specific genomic variations in jute (Corchorus sp.). J Genet Eng Biotechnol 2024; 22:100376. [PMID: 38797551 PMCID: PMC11015510 DOI: 10.1016/j.jgeb.2024.100376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 03/23/2024] [Accepted: 03/27/2024] [Indexed: 05/29/2024]
Abstract
Jute (Corchorus sp.), a commercially important and eco-friendly crop, is widely cultivated in Bangladesh, India, and China. Some varieties of this tropical plant such as the Corchorus olitorius. Variety accession no. 2015 (acc. 2015) has been found to be low-temperature tolerant. The current study was designed to explore the genome-wide variations present in the tolerant plant acc. 2015 in comparison to the sensitive farmer popular variety Corchorus olitorius var. O9897 using the whole genome resequencing technique. Among different variations, intergenic Single Nucleotide Polymorphism (SNPs) and Insertion-Deletion (InDels) were found in the highest percentage whereas approximately 3% SNPs and 2% InDels were found in exonic regions in both plants. Gene enrichment analysis indicated the presence of acc. 2015 specific SNPs in the genes encoding peroxidase, ER lumen protein retaining receptor, and hexosyltransferase involved in stress response (GO:0006950) which were not present in sensitive variety O9897. Besides, distinctive copy number variation regions (CNVRs) comprising 120 gene loci were found in acc. 2015 with a gain of function from multiple copy numbers but absent in O9897. Gene ontology analysis revealed these gene loci to possess different receptors like kinases, helicases, phosphatases, transcription factors especially Myb transcription factors, regulatory proteins containing different binding domains, annexin, laccase, acyl carrier protein, potassium transporter, and vesicular transporter proteins that are responsible for low temperature induced adaptation pathways in plants. This work of identifying genomic variations linked to cold stress tolerance traits will help to develop successful markers that will pave the way to develop genetically modified cold-resistant jute lines for year-round cultivation to meet the demand for a sustainable fiber crop economy.
Collapse
Affiliation(s)
- Athoi Ganguly
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Shaheena Amin
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh; Department of Biochemistry and Molecular Biology, National Institute of Science and Technology, Dhaka, Bangladesh
| | - Al-Amin
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Farhana Tasnim Chowdhury
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Haseena Khan
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh.
| | - Mohammad Riazul Islam
- Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh.
| |
Collapse
|
14
|
Tandon S, Sharma M, Kasar P, Kala A. A cloud-based precision oncology framework for whole genome sequence analysis. Comput Biol Chem 2024; 110:108062. [PMID: 38554501 DOI: 10.1016/j.compbiolchem.2024.108062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/05/2024] [Accepted: 03/25/2024] [Indexed: 04/01/2024]
Abstract
Cancer is one of the wide-ranging diseases which have a high mortality rate impacting globally. This scenario can be switched by early detection and correct precision treatment, a major concern for cancer patients. Clinicians can figure out the best-suited treatments for cancer patients by analyzing the patient's genome, which will treat the patient well and minimize the chances of side effects as well. Therefore, we have developed a fast, robust, and efficient solution as our precision oncology framework based on the whole genome sequencing of the individual's DNA. This platform can perform the entire genomic analysis, starting from the quality assessment of the input file to the variant annotation and functional prediction, followed by a certain level of interpretation. This analysis helps in the molecular profiling of the tumors for the identification of the targetable alterations. It takes in FASTQ or BAM file as an input and provides us with two output reports: a primary report, which consists of the patients' details, a summary of the analysis, and a secondary report, which is an elaborated report comprised of numerous results obtained from the analysis such as base changes, codon changes, amino acid changes, TMB analysis, MSI analysis, the variant frequency with its effects and impacts, affected biomarkers, etc. This framework can be effectively utilized for cancer treatment guidance, identification and validation of novel biomarkers, oncology research & development, genomic analysis, and gene manipulation.
Collapse
Affiliation(s)
- Saloni Tandon
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India.
| | - Medha Sharma
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Pratik Kasar
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Anirudh Kala
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| |
Collapse
|
15
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Paasch A, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted phasing of 2-200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method. Sci Rep 2024; 14:7988. [PMID: 38580715 PMCID: PMC10997766 DOI: 10.1038/s41598-024-58733-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/02/2024] [Indexed: 04/07/2024] Open
Abstract
In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Amber Paasch
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA.
| |
Collapse
|
16
|
Belay S, Belay G, Nigussie H, Jian-Lin H, Tijjani A, Ahbara AM, Tarekegn GM, Woldekiros HS, Mor S, Dobney K, Lebrasseur O, Hanotte O, Mwacharo JM. Whole-genome resource sequences of 57 indigenous Ethiopian goats. Sci Data 2024; 11:139. [PMID: 38287052 PMCID: PMC10825132 DOI: 10.1038/s41597-024-02973-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Domestic goats are distributed worldwide, with approximately 35% of the one billion world goat population occurring in Africa. Ethiopia has 52.5 million goats, ~99.9% of which are considered indigenous landraces deriving from animals introduced to the Horn of Africa in the distant past by nomadic herders. They have continued to be managed by smallholder farmers and semi-mobile pastoralists throughout the region. We report here 57 goat genomes from 12 Ethiopian goat populations sampled from different agro-climates. The data were generated through sequencing DNA samples on the Illumina NovaSeq 6000 platform at a mean depth of 9.71x and 150 bp pair-end reads. In total, ~2 terabytes of raw data were generated, and 99.8% of the clean reads mapped successfully against the goat reference genome assembly at a coverage of 99.6%. About 24.76 million SNPs were generated. These SNPs can be used to study the population structure and genome dynamics of goats at the country, regional, and global levels to shed light on the species' evolutionary trajectory.
Collapse
Affiliation(s)
- Shumuye Belay
- Tigray Agricultural Research Institute, Mekelle, Tigray, Ethiopia.
- Addis Ababa University, Department of Microbial, Cellular and Molecular Biology, Addis Ababa, Ethiopia.
- LiveGene Program, International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia.
| | - Gurja Belay
- Addis Ababa University, Department of Microbial, Cellular and Molecular Biology, Addis Ababa, Ethiopia.
| | - Helen Nigussie
- Addis Ababa University, Department of Microbial, Cellular and Molecular Biology, Addis Ababa, Ethiopia
| | - Han Jian-Lin
- ILRI-CAAS Joint Laboratory on Livestock and Forage Genetic Resources, Beijing, China
| | - Abdulfatai Tijjani
- LiveGene Program, International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
| | - Abulgasim M Ahbara
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Midlothian, UK
- Department of Zoology, Misurata University, Misurata, Libya
| | - Getinet M Tarekegn
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Midlothian, UK
- Institute of Biotechnology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Helina S Woldekiros
- Department of Anthropology, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Siobhan Mor
- LiveGene Program, International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Keith Dobney
- Department of Archaeology, Classics and Egyptology, University of Liverpool, Liverpool, UK
- School of Philosophical and Historical Inquiry, University of Sydney, Sydney, Australia
| | - Ophelie Lebrasseur
- Department of Archaeology, Classics and Egyptology, University of Liverpool, Liverpool, UK
| | - Olivier Hanotte
- LiveGene Program, International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
- School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Joram M Mwacharo
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Midlothian, UK.
- Small Ruminant Genomics, International Centre for Agricultural Research in the Dry Areas (ICARDA), Addis Ababa, Ethiopia.
| |
Collapse
|
17
|
Geng Z, Li W, Yang P, Zhang S, Wu S, Xiong J, Sun K, Zhu D, Chen S, Zhang B. Whole exome sequencing reveals genetic landscape associated with left ventricular outflow tract obstruction in Chinese Han population. Front Genet 2023; 14:1267368. [PMID: 38164514 PMCID: PMC10757952 DOI: 10.3389/fgene.2023.1267368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024] Open
Abstract
Left ventricular outflow tract obstruction (LVOTO), a major form of outflow tract malformation, accounts for a substantial portion of congenital heart defects (CHDs). Unlike its prevalence, the genetic architecture of LVOTO remains largely unknown. To unveil the genetic mutations and risk genes potentially associated with LVOTO, we enrolled a cohort of 106 LVOTO patients and 100 healthy controls and performed a whole-exome sequencing (WES). 71,430 rare deleterious mutations were found in LVOTO patients. By using gene-based burden testing, we further found 32 candidate genes enriched in LVOTO patient including known pathological genes such as GATA5 and GATA6. Most variants of 32 risk genes occur simultaneously rather exclusively suggesting polygenic inherence of LVOTO and 14 genes out of 32 risk genes interact with previously discovered CHD genes. Single cell RNA-seq further revealed dynamic expressions of GATA5, GATA6, FOXD3 and MYO6 in endocardium and neural crest lineage indicating the mutations of these genes lead to LVOTO possibly through different lineages. These findings uncover the genetic architecture of LVOTO which advances the current understanding of LVOTO genetics.
Collapse
Affiliation(s)
- Zilong Geng
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wenjuan Li
- Department of Pediatric Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Ping Yang
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Shasha Zhang
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Shuo Wu
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Junhao Xiong
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Kun Sun
- Department of Pediatric Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Dan Zhu
- Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Sun Chen
- Department of Pediatric Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Bing Zhang
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Pediatric Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
18
|
Ousmael K, Whetten RW, Xu J, Nielsen UB, Lamour K, Hansen OK. Identification and high-throughput genotyping of single nucleotide polymorphism markers in a non-model conifer (Abies nordmanniana (Steven) Spach). Sci Rep 2023; 13:22488. [PMID: 38110478 PMCID: PMC10728141 DOI: 10.1038/s41598-023-49462-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/08/2023] [Indexed: 12/20/2023] Open
Abstract
Single nucleotide polymorphism (SNP) markers are powerful tools for investigating population structures, linkage analysis, and genome-wide association studies, as well as for breeding and population management. The availability of SNP markers has been limited to the most commercially important timber species, primarily due to the cost of genome sequencing required for SNP discovery. In this study, a combination of reference-based and reference-free approaches were used to identify SNPs in Nordmann fir (Abies nordmanniana), a species previously lacking genomic sequence information. Using a combination of a genome assembly of the closely related Silver fir (Abies alba) species and a de novo assembly of low-copy regions of the Nordmann fir genome, we identified a high density of reliable SNPs. Reference-based approaches identified two million SNPs in common between the Silver fir genome and low-copy regions of Nordmann fir. A combination of one reference-free and two reference-based approaches identified 250 shared SNPs. A subset of 200 SNPs were used to genotype 342 individuals and thereby tested and validated in the context of identity analysis and/or clone identification. The tested SNPs successfully identified all ramets per clone and five mislabeled individuals via identity and genomic relatedness analysis. The identified SNPs will be used in ad hoc breeding of Nordmann fir in Denmark.
Collapse
Affiliation(s)
- Kedra Ousmael
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark.
| | - Ross W Whetten
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27606, USA
| | - Jing Xu
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| | - Ulrik B Nielsen
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| | - Kurt Lamour
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Ole K Hansen
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| |
Collapse
|
19
|
Raja TV, Alex R, Singh U, Kumar S, Das AK, Sengar G, Singh AK. Genome wide mining of SNPs and INDELs through ddRAD sequencing in Sahiwal cattle. Anim Biotechnol 2023; 34:4885-4899. [PMID: 37093232 DOI: 10.1080/10495398.2023.2200517] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
The study was conducted in Sahiwal cattle for genome wide identification and annotation of single nucleotide polymorphisms (SNPs) and insertions and deletions (INDELs) in Sahiwal cattle. The double digest restriction-site associated DNA (ddRAD) sequencing, a reduced representation method was used for the identification of variants at nucleotide level. A total of 1,615,211 variants were identified at RD10 and Q30 consisting of 1,480,930 SNPs and 134,281 INDELs with respect to the Bos taurus reference genome. The SNPs were annotated for their location, impact and functional class. The SNPs identified in Sahiwal cattle were found to be associated with a total of 26,229 genes. A total of 1819 SNPs were annotated for 209 candidate genes associated with different production and reproduction traits. The variants identified in the present study may be useful to strengthen the existing bovine SNP chips for reducing the biasness over the taurine cattle breeds. The diversity analysis provides the insight of the genetic architecture of the Sahiwal population Studied. The large genetic variations identified at the nucleotide level provide ample scope for implementing an effective and efficient breed improvement programme for increasing the productivity of Sahiwal cattle.
Collapse
Affiliation(s)
- Thiruvothur Venkatesan Raja
- Molecular Genetics Laboratory, Cattle Genetics and Breeding Division, ICAR-Central Institute for Research on Cattle, Meerut Cantt, Uttar Pradesh, India
| | - Rani Alex
- ICAR-National Dairy Research Institute, Karnal, Haryana, India
| | - Umesh Singh
- Molecular Genetics Laboratory, Cattle Genetics and Breeding Division, ICAR-Central Institute for Research on Cattle, Meerut Cantt, Uttar Pradesh, India
| | - Sushil Kumar
- Molecular Genetics Laboratory, Cattle Genetics and Breeding Division, ICAR-Central Institute for Research on Cattle, Meerut Cantt, Uttar Pradesh, India
| | - Achintya Kumar Das
- Molecular Genetics Laboratory, Cattle Genetics and Breeding Division, ICAR-Central Institute for Research on Cattle, Meerut Cantt, Uttar Pradesh, India
| | - Gyanendra Sengar
- National Research Centre on Pigs, Rani (Near Airport), Guwahati, Assam, India
| | - Amit Kumar Singh
- Molecular Genetics Laboratory, Cattle Genetics and Breeding Division, ICAR-Central Institute for Research on Cattle, Meerut Cantt, Uttar Pradesh, India
| |
Collapse
|
20
|
Yu C, Lan X, Tao Y, Guo Y, Sun D, Qian P, Zhou Y, Walters R, Li L, Zhu Y, Zeng J, Millwood I, Guo R, Pei P, Yang T, Du H, Yang F, Yang L, Ren F, Chen Y, Chen F, Jiang X, Ye Z, Dai L, Wei X, Xu X, Yang H, Wang J, Chen Z, Zhu H, Lv J, Jin X, Li L. A high-resolution haplotype-resolved Reference panel constructed from the China Kadoorie Biobank Study. Nucleic Acids Res 2023; 51:11770-11782. [PMID: 37870428 PMCID: PMC10681741 DOI: 10.1093/nar/gkad779] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 08/02/2023] [Accepted: 09/12/2023] [Indexed: 10/24/2023] Open
Abstract
Precision medicine depends on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is still not suitable for gigantic studies due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we used 10 000 samples with medium-depth WGS to construct a reference panel that we named the CKB reference panel. By imputing microarray datasets, it showed that the CKB panel outperformed compared panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of 100 706 microarrays with the CKB panel, and the after-imputed data is the hitherto largest whole genome data of the Chinese population. Furthermore, in the GWAS analysis of real phenotype height, the number of tested SNPs tripled and the number of significant SNPs doubled after imputation. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (https://db.cngb.org/imputation/). We believe that the CKB panel is of great value for imputing microarray or low-coverage genotype data of Chinese population, and potentially mixed populations. The imputation-completed 100 706 microarray data are enormous and precious resources of population genetic studies for complex traits and diseases.
Collapse
Affiliation(s)
- Canqing Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing 100191, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing 100191, China
| | - Xianmei Lan
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Ye Tao
- BGI Research, Shenzhen 518083, China
| | - Yu Guo
- National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Dianjianyi Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing 100191, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing 100191, China
| | - Puyi Qian
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Yuwen Zhou
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Robin G Walters
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Linxuan Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Yunqing Zhu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
| | - Jingyu Zeng
- BGI Research, Shenzhen 518083, China
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Iona Y Millwood
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | | | - Pei Pei
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing 100191, China
| | - Tao Yang
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Huaidong Du
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Fan Yang
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Ling Yang
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Fangyi Ren
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Yiping Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Fengzhen Chen
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Xiaosen Jiang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Zhiqiang Ye
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Lanlan Dai
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Xiaofeng Wei
- China National GeneBank, BGI, Shenzhen 518083, China
| | - Xun Xu
- BGI Research, Shenzhen 518083, China
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen 518083, China
| | - Huanming Yang
- BGI Research, Shenzhen 518083, China
- Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI, Shenzhen 518083, China
- James D. Watson Institute of Genome Sciences, Hangzhou 310013, China
| | | | - Zhengming Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
- Medical Research Council Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, United Kingdom
| | | | - Jun Lv
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing 100191, China
- State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Xin Jin
- BGI Research, Shenzhen 518083, China
- School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing 100191, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing 100191, China
| |
Collapse
|
21
|
Childebayeva A, Zavala EI. Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics. iScience 2023; 26:108066. [PMID: 37927550 PMCID: PMC10622734 DOI: 10.1016/j.isci.2023.108066] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Collapse
Affiliation(s)
- Ainash Childebayeva
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Anthropology, University of Kansas, Lawrence, KS, USA
| | - Elena I. Zavala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biology, University of Oregon, Eugene, OR, USA
| |
Collapse
|
22
|
Lindtke D, Seefried FR, Drögemüller C, Neuditschko M. Increased heterozygosity in low-pass sequencing data allows identification of blood chimeras in cattle. Anim Genet 2023; 54:613-618. [PMID: 37313694 DOI: 10.1111/age.13334] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/15/2023]
Abstract
In about 90% of multiple pregnancies in cattle, shared blood circulation between fetuses leads to genetic chimerism in peripheral blood and can reduce reproductive performance in heterosexual co-twins. However, the early detection of heterosexual chimeras requires specialized tests. Here, we used low-pass sequencing data with a median coverage of 0.64× generated from blood samples of 322 F1 crosses between beef and dairy cattle and identified 20 putative blood chimeras through increased levels of genome-wide heterozygosity. In contrast, for 77 samples with routine SNP microarray data generated from hair bulbs of the same F1s, we found no evidence of chimerism, simultaneously observing high levels of genotype discordance with sequencing data. Fifteen out of 18 reported twins showed signs of blood chimerism, in line with previous reports, whereas the presence of five alleged singletons with strong signs of chimerism suggests that the in-utero death rate of co-twins is at the upper limit of former estimates. Together, our results show that low-pass sequencing data allow reliable screening for blood chimeras. They further affirm that blood is not recommended as a source of DNA for the detection of germline variants.
Collapse
|
23
|
Moudgil A, Sobti RC, Kaur T. In-silico identification and comparison of transcription factor binding sites cluster in anterior-posterior patterning genes in Drosophila melanogaster and Tribolium castaneum. PLoS One 2023; 18:e0290035. [PMID: 37590227 PMCID: PMC10434971 DOI: 10.1371/journal.pone.0290035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
The cis-regulatory data that help in transcriptional regulation is arranged into modular pieces of a few hundred base pairs called CRMs (cis-regulatory modules) and numerous binding sites for multiple transcription factors are prominent characteristics of these cis-regulatory modules. The present study was designed to localize transcription factor binding site (TFBS) clusters on twelve Anterior-posterior (A-P) genes in Tribolium castaneum and compare them to their orthologous gene enhancers in Drosophila melanogaster. Out of the twelve A-P patterning genes, six were gap genes (Kruppel, Knirps, Tailless, Hunchback, Giant, and Caudal) and six were pair rule genes (Hairy, Runt, Even-skipped, Fushi-tarazu, Paired, and Odd-skipped). The genes along with 20 kb upstream and downstream regions were scanned for TFBS clusters using the Motif Cluster Alignment Search Tool (MCAST), a bioinformatics tool that looks for set of nucleotide sequences for statistically significant clusters of non-overlapping occurrence of a given set of motifs. The motifs used in the current study were Hunchback, Caudal, Giant, Kruppel, Knirps, and Even-skipped. The results of the MCAST analysis revealed the maximum number of TFBS for Hunchback, Knirps, Caudal, and Kruppel in both D. melanogaster and T. castaneum, while Bicoid TFBS clusters were found only in D. melanogaster. The size of all the predicted TFBS clusters was less than 1kb in both insect species. These sequences revealed more transversional sites (Tv) than transitional sites (Ti) and the average Ti/Tv ratio was 0.75.
Collapse
Affiliation(s)
- Anshika Moudgil
- Department of Zoology, DAV University, Jalandhar, Punjab, India
| | | | - Tejinder Kaur
- Department of Zoology, DAV University, Jalandhar, Punjab, India
| |
Collapse
|
24
|
Wang Y, Zhang H, Zhu S, Shen T, Pan H, Xu M. Association Mapping and Expression Analysis of the Genes Involved in the Wood Formation of Poplar. Int J Mol Sci 2023; 24:12662. [PMID: 37628843 PMCID: PMC10454019 DOI: 10.3390/ijms241612662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/04/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
Xylogenesis is a complex and sequential biosynthetic process controlled by polygenes. Deciphering the genetic architecture of this complex quantitative trait could provide valuable information for increasing wood biomass and improving its properties. Here, we performed genomic resequencing of 64 24-year-old trees (64 hybrids of section Aigeiros and their parents) grown in the same field and conducted full-sib family-based association analyses of two growth and six woody traits using GEMMA as a choice of association model selection. We identified 1342 significantly associated single nucleotide polymorphisms (SNPs), 673 located in the region upstream and downstream of 565 protein-encoding genes. The transcriptional regulation network of secondary cell wall (SCW) biosynthesis was further constructed based on the published data of poplar miRNA, transcriptome, and degradome. These provided a certain scientific basis for the in-depth understanding of the mechanism of poplar timber formation and the molecular-assisted breeding in the future.
Collapse
Affiliation(s)
| | | | | | | | | | - Meng Xu
- Co-Innovation Center for Sustainable Forestry in Southern China, Satae Key Laboratory of Tree Genetics and Breeding, Nanjing Forestry University, Nanjing 210037, China; (Y.W.); (H.Z.); (S.Z.); (T.S.); (H.P.)
| |
Collapse
|
25
|
Cirnigliaro M, Chang TS, Arteaga SA, Pérez-Cano L, Ruzzo EK, Gordon A, Bicks LK, Jung JY, Lowe JK, Wall DP, Geschwind DH. The contributions of rare inherited and polygenic risk to ASD in multiplex families. Proc Natl Acad Sci U S A 2023; 120:e2215632120. [PMID: 37506195 PMCID: PMC10400943 DOI: 10.1073/pnas.2215632120] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 06/13/2023] [Indexed: 07/30/2023] Open
Abstract
Autism spectrum disorder (ASD) has a complex genetic architecture involving contributions from both de novo and inherited variation. Few studies have been designed to address the role of rare inherited variation or its interaction with common polygenic risk in ASD. Here, we performed whole-genome sequencing of the largest cohort of multiplex families to date, consisting of 4,551 individuals in 1,004 families having two or more autistic children. Using this study design, we identify seven previously unrecognized ASD risk genes supported by a majority of rare inherited variants, finding support for a total of 74 genes in our cohort and a total of 152 genes after combined analysis with other studies. Autistic children from multiplex families demonstrate an increased burden of rare inherited protein-truncating variants in known ASD risk genes. We also find that ASD polygenic score (PGS) is overtransmitted from nonautistic parents to autistic children who also harbor rare inherited variants, consistent with combinatorial effects in the offspring, which may explain the reduced penetrance of these rare variants in parents. We also observe that in addition to social dysfunction, language delay is associated with ASD PGS overtransmission. These results are consistent with an additive complex genetic risk architecture of ASD involving rare and common variation and further suggest that language delay is a core biological feature of ASD.
Collapse
Affiliation(s)
- Matilde Cirnigliaro
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Timothy S. Chang
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Stephanie A. Arteaga
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Laura Pérez-Cano
- STALICLA Discovery and Data Science Unit, World Trade Center, Barcelona08039, Spain
| | - Elizabeth K. Ruzzo
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Aaron Gordon
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Lucy K. Bicks
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Jae-Yoon Jung
- Department of Pediatrics, Division of Systems Medicine, Stanford University, Stanford, CA94304
- Department of Biomedical Data Science, Stanford University, Stanford, CA94305
| | - Jennifer K. Lowe
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| | - Dennis P. Wall
- Department of Pediatrics, Division of Systems Medicine, Stanford University, Stanford, CA94304
- Department of Biomedical Data Science, Stanford University, Stanford, CA94305
| | - Daniel H. Geschwind
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA90095
| |
Collapse
|
26
|
Yauy K, Van Goethem C, Pégeot H, Baux D, Guignard T, Thèze C, Ardouin O, Roux AF, Koenig M, Bergougnoux A, Cossée M. Evaluating the Transition from Targeted to Exome Sequencing: A Guide for Clinical Laboratories. Int J Mol Sci 2023; 24:ijms24087330. [PMID: 37108493 PMCID: PMC10138641 DOI: 10.3390/ijms24087330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/03/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023] Open
Abstract
The transition from targeted to exome or genome sequencing in clinical contexts requires quality standards, such as targeted sequencing, in order to be fully adopted. However, no clear recommendations or methodology have emerged for evaluating this technological evolution. We developed a structured method based on four run-specific sequencing metrics and seven sample-specific sequencing metrics for evaluating the performance of exome sequencing strategies to replace targeted strategies. The indicators include quality metrics and coverage performance on gene panels and OMIM morbid genes. We applied this general strategy to three different exome kits and compared them with a myopathy-targeted sequencing method. After having achieved 80 million reads, all-tested exome kits generated data suitable for clinical diagnosis. However, significant differences in the coverage and PCR duplicates were observed between the kits. These are two main criteria to consider for the initial implementation with high-quality assurance. This study aims to assist molecular diagnostic laboratories in adopting and evaluating exome sequencing kits in a diagnostic context compared to the strategy used previously. A similar strategy could be used to implement whole-genome sequencing for diagnostic purposes.
Collapse
Affiliation(s)
- Kevin Yauy
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- Service de Génétique Médicale, CHU Montpellier, 371 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - Charles Van Goethem
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - Henri Pégeot
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - David Baux
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- INM, Université de Montpellier, INSERM, Hôpital Saint Eloi-Bâtiment INM 80, rue Augustin Fliche-BP 74103, 34090 Montpellier, France
| | - Thomas Guignard
- Unité de Génétique Chromosomique, Département de Génétique Médicale, Maladies Rares et Médecine Personnalisée, Hôpital Arnaud de Villeneuve, CHU de Montpellier, 371 Av. du Doyen Gaston Giraud, 34090 Montpellier, France
| | - Corinne Thèze
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - Olivier Ardouin
- Plateau de Médecine Moléculaire et Génomique, Hôpital Arnaud de Villeneuve, CHU de Montpellier, 34090 Montpellier, France
| | - Anne-Françoise Roux
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- INM, Université de Montpellier, INSERM, Hôpital Saint Eloi-Bâtiment INM 80, rue Augustin Fliche-BP 74103, 34090 Montpellier, France
| | - Michel Koenig
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- PhyMedExp-Physiologie et Médecine Expérimentale du Cœur et des Muscles, Université de Montpellier, Inserm U1046, CNRS UMR 9214, 371 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - Anne Bergougnoux
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- PhyMedExp-Physiologie et Médecine Expérimentale du Cœur et des Muscles, Université de Montpellier, Inserm U1046, CNRS UMR 9214, 371 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| | - Mireille Cossée
- Laboratoire de Génétique Moléculaire, LGM, Centre Hospitalier Universitaire de Montpellier, IURC-Institut Universitaire de Recherche Clinique, 641 Avenue du Doyen G. Giraud, 34090 Montpellier, France
- PhyMedExp-Physiologie et Médecine Expérimentale du Cœur et des Muscles, Université de Montpellier, Inserm U1046, CNRS UMR 9214, 371 Avenue du Doyen G. Giraud, 34090 Montpellier, France
| |
Collapse
|
27
|
Jackson A, Lin SJ, Jones EA, Chandler KE, Orr D, Moss C, Haider Z, Ryan G, Holden S, Harrison M, Burrows N, Jones WD, Loveless M, Petree C, Stewart H, Low K, Donnelly D, Lovell S, Drosou K, Varshney GK, Banka S. Clinical, genetic, epidemiologic, evolutionary, and functional delineation of TSPEAR-related autosomal recessive ectodermal dysplasia 14. HGG ADVANCES 2023; 4:100186. [PMID: 37009414 PMCID: PMC10064225 DOI: 10.1016/j.xhgg.2023.100186] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 02/27/2023] [Indexed: 06/11/2023] Open
Abstract
TSPEAR variants cause autosomal recessive ectodermal dysplasia (ARED) 14. The function of TSPEAR is unknown. The clinical features, the mutation spectrum, and the underlying mechanisms of ARED14 are poorly understood. Combining data from new and previously published individuals established that ARED14 is primarily characterized by dental anomalies such as conical tooth cusps and hypodontia, like those seen in individuals with WNT10A-related odontoonychodermal dysplasia. AlphaFold-predicted structure-based analysis showed that most of the pathogenic TSPEAR missense variants likely destabilize the β-propeller of the protein. Analysis of 100000 Genomes Project (100KGP) data revealed multiple founder TSPEAR variants across different populations. Mutational and recombination clock analyses demonstrated that non-Finnish European founder variants likely originated around the end of the last ice age, a period of major climatic transition. Analysis of gnomAD data showed that the non-Finnish European population TSPEAR gene-carrier rate is ∼1/140, making it one of the commonest AREDs. Phylogenetic and AlphaFold structural analyses showed that TSPEAR is an ortholog of drosophila Closca, an extracellular matrix-dependent signaling regulator. We, therefore, hypothesized that TSPEAR could have a role in enamel knot, a structure that coordinates patterning of developing tooth cusps. Analysis of mouse single-cell RNA sequencing (scRNA-seq) data revealed highly restricted expression of Tspear in clusters representing enamel knots. A tspeara -/-;tspearb -/- double-knockout zebrafish model recapitulated the clinical features of ARED14 and fin regeneration abnormalities of wnt10a knockout fish, thus suggesting interaction between tspear and wnt10a. In summary, we provide insights into the role of TSPEAR in ectodermal development and the evolutionary history, epidemiology, mechanisms, and consequences of its loss of function variants.
Collapse
Affiliation(s)
- Adam Jackson
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Sheng-Jia Lin
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Elizabeth A. Jones
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Kate E. Chandler
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - David Orr
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Celia Moss
- Department of Dermatology, Birmingham Children’s Hospital, Birmingham Women’s and Children’s NHS Foundation Trust, Birmingham, UK
| | - Zahra Haider
- Department of Dermatology, Birmingham Children’s Hospital, Birmingham Women’s and Children’s NHS Foundation Trust, Birmingham, UK
| | - Gavin Ryan
- West Midlands Regional Genetics Laboratory, Birmingham Women’s and Children’s NHS Foundation Trust, Birmingham, UK
| | - Simon Holden
- Clinical Genetics, Addenbrooke’s Hospital, Cambridge, UK
| | - Mike Harrison
- Department of Pediatric Dentistry, Guy’s and St Thomas' Dental Institute, London, UK
| | - Nigel Burrows
- Department of Dermatology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Wendy D. Jones
- North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children, Great Ormond Street NHS Foundation Trust, London, UK
| | - Mary Loveless
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Cassidy Petree
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Helen Stewart
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Karen Low
- Department of Clinical Genetics, St Michael’s Hospital, Bristol, UK
| | - Deirdre Donnelly
- Department of Genetic Medicine, Belfast HSC Trust, Lisburn Road, Belfast, UK
| | - Simon Lovell
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Konstantina Drosou
- Department of Earth and Environmental Sciences, Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
- Division of Cell Matrix Biology and Regenerative Medicine, Faculty of Biology, Medicine and Health, University of Manchester, 99 Oxford Road, Manchester, UK
| | - Gaurav K. Varshney
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Siddharth Banka
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| |
Collapse
|
28
|
Valverde-Hernández JC, Flores-Cruz A, Chavarría-Soley G, Silva de la Fuente S, Campos-Sánchez R. Frequencies of variants in genes associated with dyslipidemias identified in Costa Rican genomes. Front Genet 2023; 14:1114774. [PMID: 37065472 PMCID: PMC10098023 DOI: 10.3389/fgene.2023.1114774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 03/14/2023] [Indexed: 04/18/2023] Open
Abstract
Dyslipidemias are risk factors in diseases of significant importance to public health, such as atherosclerosis, a condition that contributes to the development of cardiovascular disease. Unhealthy lifestyles, the pre-existence of diseases, and the accumulation of genetic variants in some loci contribute to the development of dyslipidemia. The genetic causality behind these diseases has been studied primarily on populations with extensive European ancestry. Only some studies have explored this topic in Costa Rica, and none have focused on identifying variants that can alter blood lipid levels and quantifying their frequency. To fill this gap, this study focused on identifying variants in 69 genes involved in lipid metabolism using genomes from two studies in Costa Rica. We contrasted the allelic frequencies with those of groups reported in the 1000 Genomes Project and gnomAD and identified potential variants that could influence the development of dyslipidemias. In total, we detected 2,600 variants in the evaluated regions. However, after various filtering steps, we obtained 18 variants that have the potential to alter the function of 16 genes, nine variants have pharmacogenomic or protective implications, eight have high risk in Variant Effect Predictor, and eight were found in other Latin American genetic studies of lipid alterations and the development of dyslipidemia. Some of these variants have been linked to changes in blood lipid levels in other global studies and databases. In future studies, we propose to confirm at least 40 variants of interest from 23 genes in a larger cohort from Costa Rica and Latin American populations to determine their relevance regarding the genetic burden for dyslipidemia. Additionally, more complex studies should arise that include diverse clinical, environmental, and genetic data from patients and controls and functional validation of the variants.
Collapse
Affiliation(s)
| | - Andrés Flores-Cruz
- Centro de Investigación en Biología Celular y Molecular, University of Costa Rica, San José, Costa Rica
| | - Gabriela Chavarría-Soley
- Centro de Investigación en Biología Celular y Molecular, University of Costa Rica, San José, Costa Rica
- Escuela de Biología, University of Costa Rica, San José, Costa Rica
| | - Sandra Silva de la Fuente
- Centro de Investigación en Biología Celular y Molecular, University of Costa Rica, San José, Costa Rica
| | - Rebeca Campos-Sánchez
- Centro de Investigación en Biología Celular y Molecular, University of Costa Rica, San José, Costa Rica
| |
Collapse
|
29
|
Zhai Y, Bardel C, Vallée M, Iwaz J, Roy P. Performance comparisons between clustering models for reconstructing NGS results from technical replicates. Front Genet 2023; 14:1148147. [PMID: 37007945 PMCID: PMC10060969 DOI: 10.3389/fgene.2023.1148147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.
Collapse
Affiliation(s)
- Yue Zhai
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- *Correspondence: Yue Zhai,
| | - Claire Bardel
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
- Service de Génétique, Hospices Civils de Lyon, Bron, France
| | - Maxime Vallée
- Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France
| | - Jean Iwaz
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| | - Pascal Roy
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
30
|
Genomic diversity and signals of selection processes in wild and farm-reared red-legged partridges (Alectoris rufa). Genomics 2023; 115:110591. [PMID: 36849018 DOI: 10.1016/j.ygeno.2023.110591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 02/18/2023] [Accepted: 02/22/2023] [Indexed: 02/27/2023]
Abstract
The genetic dynamics of wild populations with releases of farm-reared reinforcements are very complex. These releases can endanger wild populations through genetic swamping or by displacing them. We assessed the genomic differences between wild and farm-reared red-legged partridges (Alectoris rufa) and described differential selection signals between both populations. We sequenced the whole genome of 30 wild and 30 farm-reared partridges. Both partridges had similar nucleotide diversity (π). Farm-reared partridges had a more negative Tajima's D and more and longer regions of extended haplotype homozygosity than wild partridges. We observed higher inbreeding coefficients (FIS and FROH) in wild partridges. Selective sweeps (Rsb) were enriched with genes that contribute to the reproductive, skin and feather colouring, and behavioural differences between wild and farm-reared partridges. The analysis of genomic diversity should inform future decisions for the preservation of wild populations.
Collapse
|
31
|
Whole exome sequencing of 28 families of Danish descent reveals novel candidate genes and pathways in developmental dysplasia of the hip. Mol Genet Genomics 2023; 298:329-342. [PMID: 36454308 PMCID: PMC9938029 DOI: 10.1007/s00438-022-01980-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 11/15/2022] [Indexed: 12/05/2022]
Abstract
Developmental dysplasia of the hip (DDH) is a common condition involving instability of the hip with multifactorial etiology. Early diagnosis and treatment are critical as undetected DDH is an important cause of long-term hip complications. Better diagnostics may be achieved through genetic methods, especially for patients with positive family history. Several candidate genes have been reported but the exact molecular etiology of the disease is yet unknown. In the present study, we performed whole exome sequencing of DDH patients from 28 families with at least two affected first-degree relatives. Four genes previously not associated with DDH (METTL21B, DIS3L2, PPP6R2, and TM4SF19) were identified with the same variants shared among affected family members, in more than two families. Among known association genes, we found damaging variants in DACH1, MYH10, NOTCH2, TBX4, EVC2, OTOG, and SHC3. Mutational burden analysis across the families identified 322 candidate genes, and enriched pathways include the extracellular matrix, cytoskeleton, ion-binding, and detection of mechanical stimulus. Taken altogether, our data suggest a polygenic mode of inheritance for DDH, and we propose that an impaired transduction of the mechanical stimulus is involved in the etiopathological mechanism. Our findings refine our current understanding of candidate causal genes in DDH, and provide a foundation for downstream functional studies.
Collapse
|
32
|
Høy Hansen M, Steensboe Lang C, Abildgaard N, Nyvold CG. Comparative evaluation of the heterozygous variant standard deviation as a quality measure for next-generation sequencing. J Biomed Inform 2022; 135:104234. [DOI: 10.1016/j.jbi.2022.104234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 09/15/2022] [Accepted: 10/17/2022] [Indexed: 11/30/2022]
|
33
|
Faulk C. De novo sequencing, diploid assembly, and annotation of the black carpenter ant, Camponotus pennsylvanicus, and its symbionts by one person for $1000, using nanopore sequencing. Nucleic Acids Res 2022; 51:17-28. [PMID: 35724982 PMCID: PMC9841434 DOI: 10.1093/nar/gkac510] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/19/2022] [Accepted: 05/31/2022] [Indexed: 02/07/2023] Open
Abstract
The black carpenter ant (Camponotus pennsylvanicus) is a pest species found widely throughout North America. From a single individual I used long-read nanopore sequencing to assemble a phased diploid genome of 306 Mb and 60X coverage, with quality assessed by a 97.0% BUSCO score, improving upon other ant assemblies. The mitochondrial genome reveals minor rearrangements from other ants. The reads also allowed assembly of parasitic and symbiont genomes. I include a complete Wolbachia bacterial assembly with a size of 1.2 Mb, as well as a commensal symbiont Blochmannia pennsylvanicus, at 791 kb. DNA methylation and hydroxymethylation were measured at base-pair resolution level from the same reads and confirmed extremely low levels seen in the Formicidae family. There was moderate heterozygosity, with 0.16% of bases being biallelic from the parental haplotypes. Protein prediction yielded 14 415 amino acid sequences with 95.8% BUSCO score and 86% matching to previously known proteins. All assemblies were derived from a single MinION flow cell generating 20 Gb of sequence for a cost of $1047 including consumable reagents. Adding fixed costs for equipment brings the total for an ant-sized genome to less than $5000. All analyses were performed in 1 week on a single desktop computer.
Collapse
|
34
|
Vasilyeva TA, Marakhonov AV, Kutsev SI, Zinchenko RA. Relative Frequencies of PAX6 Mutational Events in a Russian Cohort of Aniridia Patients in Comparison with the World's Population and the Human Genome. Int J Mol Sci 2022; 23:ijms23126690. [PMID: 35743132 PMCID: PMC9223373 DOI: 10.3390/ijms23126690] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 06/06/2022] [Accepted: 06/13/2022] [Indexed: 12/10/2022] Open
Abstract
Genome-wide sequencing metadata allows researchers to infer bias in the relative frequencies of mutational events and to predict putative mutagenic models. In addition, much less data could be useful in the evaluation of the mutational frequency spectrum and the prevalent local mutagenic process. Here we analyzed the PAX6 gene locus for mutational spectra obtained in our own and previous studies and compared them with data on other genes as well as the whole human genome. MLPA and Sanger sequencing were used for mutation searching in a cohort of 199 index patients from Russia with aniridia and aniridia-related phenotypes. The relative frequencies of different categories of PAX6 mutations were consistent with those previously reported by other researchers. The ratio between substitutions, small indels, and chromosome deletions in the 11p13 locus was within the interval previously published for 20 disease associated genomic loci, but corresponded to a higher end due to very high frequencies of small indels and chromosome deletions. The ratio between substitutions, small indels, and chromosome deletions for disease associated genes, including the PAX6 gene as well as the share of PAX6 missense mutations, differed considerably from those typical for the whole genome.
Collapse
Affiliation(s)
- Tatyana A. Vasilyeva
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
| | - Andrey V. Marakhonov
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
- Correspondence: ; Tel.: +7-499-320-60-90
| | - Sergey I. Kutsev
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
| | - Rena A. Zinchenko
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
- N.A. Semashko National Research Institute of Public Health, 105064 Moscow, Russia
| |
Collapse
|
35
|
Gibitova EA, Dobrynin PV, Pomerantseva EA, Musatova EV, Kostareva A, Evsyukov I, Rychkov SY, Zhukova OV, Naumova OY, Grigorenko EL. A Study of the Genomic Variations Associated with Autistic Spectrum Disorders in a Russian Cohort of Patients Using Whole-Exome Sequencing. Genes (Basel) 2022; 13:genes13050920. [PMID: 35627305 PMCID: PMC9141003 DOI: 10.3390/genes13050920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/30/2022] [Accepted: 05/16/2022] [Indexed: 12/10/2022] Open
Abstract
This study provides new data on the whole-exome sequencing of a cohort of children with autistic spectrum disorders (ASD) from an underexplored Russian population. Using both a cross-sectional approach involving a control cohort of the same ancestry and an annotation-based approach involving relevant public databases, we explored exonic single nucleotide variants and copy-number variation potentially involved in the manifestation of ASD. The study results reveal new potential ASD candidate-variants found in the studied Russian cohort and show a high prevalence of common ASD-associated genomic variants, especially those in the genes known to be associated with the manifestation of intellectual disabilities. Our screening of an ASD cohort from a previously understudied population allowed us to flag at least a few novel genes (IGLJ2, FAM21A, OR11H12, HIP1, PRAMEF10, and ZNF717) regarding their potential involvement in ASD.
Collapse
Affiliation(s)
- Ekaterina A. Gibitova
- Computer Technologies Laboratory, University of Information Technologies, Mechanics and Optics, Saint Petersburg 197101, Russia; (E.A.G.); (P.V.D.); (I.E.)
| | - Pavel V. Dobrynin
- Computer Technologies Laboratory, University of Information Technologies, Mechanics and Optics, Saint Petersburg 197101, Russia; (E.A.G.); (P.V.D.); (I.E.)
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Laboratory, Vavilov Institute of General Genetics RAS, Moscow 119991, Russia; (S.Y.R.); (O.V.Z.)
| | - Ekaterina A. Pomerantseva
- The ‘Genetico’ Center for Genetics and Reproductive Medicine, Moscow 119333, Russia; (E.A.P.); (E.V.M.)
| | - Elizaveta V. Musatova
- The ‘Genetico’ Center for Genetics and Reproductive Medicine, Moscow 119333, Russia; (E.A.P.); (E.V.M.)
| | - Anna Kostareva
- Almazov National Medical Research Centre, Saint Petersburg 197341, Russia;
- Department of Women’s and Children’s Health, Karolinska Institute, Stockholm 17177, Sweden
| | - Igor Evsyukov
- Computer Technologies Laboratory, University of Information Technologies, Mechanics and Optics, Saint Petersburg 197101, Russia; (E.A.G.); (P.V.D.); (I.E.)
| | - Sergey Y. Rychkov
- Human Genetics Laboratory, Vavilov Institute of General Genetics RAS, Moscow 119991, Russia; (S.Y.R.); (O.V.Z.)
| | - Olga V. Zhukova
- Human Genetics Laboratory, Vavilov Institute of General Genetics RAS, Moscow 119991, Russia; (S.Y.R.); (O.V.Z.)
| | - Oxana Y. Naumova
- Human Genetics Laboratory, Vavilov Institute of General Genetics RAS, Moscow 119991, Russia; (S.Y.R.); (O.V.Z.)
- Department of Psychology, University of Houston, Houston, TX 77204, USA
- Department of Psychology, Saint-Petersburg State University, Saint Petersburg 199034, Russia
- Correspondence: (O.Y.N.); (E.L.G.)
| | - Elena L. Grigorenko
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Psychology, University of Houston, Houston, TX 77204, USA
- Department of Psychology, Saint-Petersburg State University, Saint Petersburg 199034, Russia
- Center of Cognitive Research, Sirius University of Science and Technology, Sochi 354340, Russia
- Correspondence: (O.Y.N.); (E.L.G.)
| |
Collapse
|
36
|
Giovannetti A, Bianco SD, Traversa A, Panzironi N, Bruselles A, Lazzari S, Liorni N, Tartaglia M, Carella M, Pizzuti A, Mazza T, Caputo V. MiRLog and dbmiR: prioritization and functional annotation tools to study human microRNA sequence variants. Hum Mutat 2022; 43:1201-1215. [PMID: 35583122 PMCID: PMC9546175 DOI: 10.1002/humu.24399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 05/03/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022]
Abstract
The recent identification of noncoding variants with pathogenic effects suggests that these variations could underlie a significant number of undiagnosed cases. Several computational methods have been developed to predict the functional impact of noncoding variants, but they exhibit only partial concordance and are not integrated with functional annotation resources, making the interpretation of these variants still challenging. MicroRNAs (miRNAs) are small noncoding RNA molecules that act as fine regulators of gene expression and play crucial functions in several biological processes, such as cell proliferation and differentiation. An increasing number of studies demonstrate a significant impact of miRNA single nucleotide variants (SNVs) both in Mendelian diseases and complex traits. To predict the functional effect of miRNA SNVs, we implemented a new meta‐predictor, MiRLog, and we integrated it into a comprehensive database, dbmiR, which includes a precompiled list of all possible miRNA allelic SNVs, providing their biological annotations at nucleotide and miRNA levels. MiRLog and dbmiR were used to explore the genetic variability of miRNAs in 15,708 human genomes included in the gnomAD project, finding several ultra‐rare SNVs with a potentially deleterious effect on miRNA biogenesis and function representing putative contributors to human phenotypes.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Salvatore Daniele Bianco
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alice Traversa
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Noemi Panzironi
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Niccolò Liorni
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Marco Tartaglia
- Genetics and Rare Diseases Research Division, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome, Italy
| | - Massimo Carella
- Medical Genetics Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Antonio Pizzuti
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Tommaso Mazza
- Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
37
|
Null M, Dupuis J, Sheinidashtegol P, Layer RM, Gignoux CR, Hendricks AE. RAREsim: A simulation method for very rare genetic variants. Am J Hum Genet 2022; 109:680-691. [PMID: 35298919 PMCID: PMC9069075 DOI: 10.1016/j.ajhg.2022.02.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 02/09/2022] [Indexed: 11/18/2022] Open
Abstract
Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.
Collapse
Affiliation(s)
- Megan Null
- Mathematical and Statistical Sciences, University of Colorado, Denver, Denver, CO 80204, USA; Mathematics and Physical Sciences, The College of Idaho, Caldwell, ID 83605, USA.
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
| | | | - Ryan M Layer
- Boulder and BioFrontiers Institute, University of Colorado Boulder, Boulder, CO 80309, USA; Department of Computer Science, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Mathematical and Statistical Sciences, University of Colorado, Denver, Denver, CO 80204, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
38
|
Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, Dilthey AT, Marschall T. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet 2022; 54:518-525. [PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w] [Citation(s) in RCA: 121] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/03/2022] [Indexed: 12/30/2022]
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
Collapse
Affiliation(s)
- Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
39
|
One Health and Cattle Genetic Resources: Mining More than 500 Cattle Genomes to Identify Variants in Candidate Genes Potentially Affecting Coronavirus Infections. Animals (Basel) 2022; 12:ani12070838. [PMID: 35405828 PMCID: PMC8997118 DOI: 10.3390/ani12070838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/14/2022] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Simple Summary The conservation and exploitation of cattle genetic resources for selection and breeding purposes are important for the definition of sustainable livestock production sectors. One Health approaches should be integrated into these activities to reduce the risk posed by many zoonoses. Coronaviruses are emerging as important zoonotic agents, with the potential to easily cross species barriers, as also recently demonstrated by the COVID-19 pandemic derived by SARS-CoV-2. Genetic resistance to coronavirus infections can be determined by variants of the host (animal) genome segregating within species. In this study, we mined the genome of more than 500 cattle to identify variants that could be involved so as to define different levels of susceptibility and/or resistance to coronavirus diseases in this important livestock species. Using comparative analyses across species, we identified several single amino acid polymorphisms that might alter the function of key proteins involved in the basic biological mechanisms underlying the infection processes in cattle. This study provided new elements to consider genetic variability of the host (cattle) as a potential risk factor to be considered in One Health perspectives. Abstract Epidemiological and biological characteristics of coronaviruses and their ability to cross species barriers are a matter of increasing concerns for these zoonotic agents. To prevent their spread, One Health approaches should be designed to include the host (animal) genome variability as a potential risk factor that might confer genetic resistance or susceptibility to coronavirus infections. At present, there is no example that considers cattle genetic resources for this purpose. In this study, we investigated the variability of six genes (ACE2, ANPEP, CEACAM1 and DPP4 encoding for host receptors of coronaviruses; FURIN and TMPRSS2 encoding for host proteases involved in coronavirus infection) by mining whole genome sequencing datasets from more than 500 cattle of 34 Bos taurus breeds and three related species. We identified a total of 180 protein variants (44 already known from the ARS-UCD1.2 reference genome). Some of them determine altered protein functions or the virus–host interaction and the related virus entry processes. The results obtained in this study constitute a first step towards the definition of a One Health strategy that includes cattle genetic resources as reservoirs of host gene variability useful to design conservation and selection programs to increase resistance to coronavirus diseases.
Collapse
|
40
|
Decoding the Gene Variants of Two Native Probiotic Lactiplantibacillus plantarum Strains through Whole-Genome Resequencing: Insights into Bacterial Adaptability to Stressors and Antimicrobial Strength. Genes (Basel) 2022; 13:genes13030443. [PMID: 35327997 PMCID: PMC8953754 DOI: 10.3390/genes13030443] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/25/2022] [Accepted: 01/26/2022] [Indexed: 02/05/2023] Open
Abstract
In this study, whole-genome resequencing of two native probiotic Lactiplantibacillus plantarum strains—UTNGt21A and UTNGt2—was assessed in order to identify variants and perform annotation of genes involved in bacterial adaptability to different stressors, as well as their antimicrobial strength. A total of 21,906 single-nucleotide polymorphisms (SNPs) were detected in UTNGt21A, while 17,610 were disclosed in the UTNGt2 genome. The comparative genomic analysis revealed a greater number of deletions, transversions, and transitions within the UTNGt21A genome, while a small difference in the number of insertions was detected between the strains. A divergent number of types of variant annotations were detected in both strains, and categorized in terms of low, moderate, and high modifier impact on the protein effectiveness. Although both native strains shared common specific genes involved in the stress response to the gastrointestinal environment, which may qualify as a putative probiotic (bile salt, acid, temperature, osmotic stress), they were different in their antimicrobial gene cluster organization, with UTNGt21A displaying a complex bacteriocin gene arrangement and dissimilar gene variants that might alter their defense mechanisms and overall inhibitory capacity. The genome comparison revealed 34 and 9 genomic islands (GIs) in the UTNGt21A and UTNGt2 genomes, respectively, with the overrepresentation of genes involved in defense mechanisms and carbohydrate utilization. In addition, pan-genome analysis disclosed the presence of various strain-specific genes (shell genes), suggesting a high genome variation between strains. This genome analysis illustrates that the bacteriocin signature and gene variants reflect a niche-inherent pattern. These extensive genomic datasets will guide us to understand the potential benefits of the native strains and their utility in the food or pharmaceutical sectors.
Collapse
|
41
|
Zhao S, Jiang L, Yu H, Guo Y. GTQC: Automated Genotyping Array Quality Control and Report. J Genomics 2022; 10:39-44. [PMID: 35300047 PMCID: PMC8922302 DOI: 10.7150/jgen.69860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 01/26/2022] [Indexed: 12/16/2022] Open
Abstract
Genotyping array is the most economical approach for conducting large-scale genome-wide genetic association studies. Thorough quality control is key to generating high integrity genotyping data and robust results. Quality control of genotyping array is generally a complicated process, as it requires intensive manual labor in implementing the established protocols and curating a comprehensive quality report. There is an urgent need to reduce manual intervention via an automated quality control process. Based on previously established protocols and strategies, we developed an R package GTQC (GenoTyping Quality Control) to automate a majority of the quality control steps for general array genotyping data. GTQC covers a comprehensive spectrum of genotype data quality metrics and produces a detailed HTML report comprising tables and figures. Here, we describe the concepts underpinning GTQC and demonstrate its effectiveness using a real genotyping dataset. R package GTQC streamlines a majority of the quality control steps and produces a detailed HTML report on a plethora of quality control metrics, thus enabling a swift and rigorous data quality inspection prior to downstream GWAS and related analyses. By significantly cutting down on the time on genotyping quality control procedures, GTQC ensures maximum utilization of available resources and minimizes waste and inefficient allocation of manual efforts. GTQC tool can be accessed at https://github.com/slzhao/GTQC.
Collapse
Affiliation(s)
- Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Limin Jiang
- Department Internal Medicine, University of New Mexico, Comprehensive Cancer Center, Albuquerque, NM
| | - Hui Yu
- Department Internal Medicine, University of New Mexico, Comprehensive Cancer Center, Albuquerque, NM
| | - Yan Guo
- Department Internal Medicine, University of New Mexico, Comprehensive Cancer Center, Albuquerque, NM
| |
Collapse
|
42
|
Gheyas A, Vallejo-Trujillo A, Kebede A, Dessie T, Hanotte O, Smith J. Whole genome sequences of 234 indigenous African chickens from Ethiopia. Sci Data 2022; 9:53. [PMID: 35165296 PMCID: PMC8844291 DOI: 10.1038/s41597-022-01129-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 12/15/2021] [Indexed: 11/15/2022] Open
Abstract
Indigenous chickens predominate poultry production in Africa. Although preferred for backyard farming because of their adaptability to harsh tropical environments, these populations suffer from relatively low productivity compared to commercial lines. Genome analyses can unravel the genetic potential of improvement of these birds for both production and resilience traits for the benefit of African poultry farming systems. Here we report whole-genome sequences of 234 indigenous chickens from 24 Ethiopian populations distributed under diverse agro-climatic conditions. The data represents over eight terabytes of paired-end sequences from the Ilumina HiSeqX platform with an average coverage of about 57X. Almost 99% of the sequence reads could be mapped against the chicken reference genome (GRCg6a), confirming the high quality of the data. Variant calling detected around 15 million SNPs, of which about 86% are known variants (i.e., present in public databases), providing further confidence on the data quality. The dataset provides an excellent resource for investigating genetic diversity and local environmental adaptations with important implications for breed improvement and conservation purposes. Measurement(s) | genome | Technology Type(s) | DNA sequencing | Factor Type(s) | animal population | Sample Characteristic - Organism | Gallus gallus | Sample Characteristic - Location | Ethiopia |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16999891
Collapse
|
43
|
Forensic Genetic Genealogy using microarrays for the identification of human remains: the need for good quality samples – a pilot study. Forensic Sci Int 2022; 334:111242. [DOI: 10.1016/j.forsciint.2022.111242] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/31/2022] [Accepted: 02/23/2022] [Indexed: 11/20/2022]
|
44
|
Çelik G, Tuncalı T. ROHMM-A flexible hidden Markov model framework to detect runs of homozygosity from genotyping data. Hum Mutat 2021; 43:158-168. [PMID: 34923717 DOI: 10.1002/humu.24316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/29/2021] [Accepted: 12/15/2021] [Indexed: 11/05/2022]
Abstract
Runs of long homozygous (ROH) stretches are considered to be the result of consanguinity and usually contain recessive deleterious disease-causing mutations. Several algorithms have been developed to detect ROHs. Here, we developed a simple alternative strategy by examining X chromosome non-pseudoautosomal region to detect the ROHs from next-generation sequencing data utilizing the genotype probabilities and the hidden Markov model algorithm as a tool, namely ROHMM. It is implemented purely in java and contains both a command line and a graphical user interface. We tested ROHMM on simulated data as well as real population data from the 1000G Project and a clinical sample. Our results have shown that ROHMM can perform robustly producing highly accurate homozygosity estimations under all conditions thereby meeting and even exceeding the performance of its natural competitors.
Collapse
Affiliation(s)
- Gökalp Çelik
- Health Sciences Institute, Department of Medical Genetics, Ankara Yildirim Beyazit University, Ankara, Turkey
| | - Timur Tuncalı
- Department of Medical Genetics, Ankara University School of Medicine, Ankara, Turkey
| |
Collapse
|
45
|
Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey WK, Mickelson JR, McCue ME. Genetic Variation and the Distribution of Variant Types in the Horse. Front Genet 2021; 12:758366. [PMID: 34925451 PMCID: PMC8676274 DOI: 10.3389/fgene.2021.758366] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
Genetic variation is a key contributor to health and disease. Understanding the link between an individual's genotype and the corresponding phenotype is a major goal of medical genetics. Whole genome sequencing (WGS) within and across populations enables highly efficient variant discovery and elucidation of the molecular nature of virtually all genetic variation. Here, we report the largest catalog of genetic variation for the horse, a species of importance as a model for human athletic and performance related traits, using WGS of 534 horses. We show the extent of agreement between two commonly used variant callers. In data from ten target breeds that represent major breed clusters in the domestic horse, we demonstrate the distribution of variants, their allele frequencies across breeds, and identify variants that are unique to a single breed. We investigate variants with no homozygotes that may be potential embryonic lethal variants, as well as variants present in all individuals that likely represent regions of the genome with errors, poor annotation or where the reference genome carries a variant. Finally, we show regions of the genome that have higher or lower levels of genetic variation compared to the genome average. This catalog can be used for variant prioritization for important equine diseases and traits, and to provide key information about regions of the genome where the assembly and/or annotation need to be improved.
Collapse
Affiliation(s)
- S. A. Durward-Akhurst
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| | - R. J. Schaefer
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| | - B. Grantham
- Interval Bio LLC, Mountain View, CA, United States
| | - W. K. Carey
- Interval Bio LLC, Mountain View, CA, United States
| | - J. R. Mickelson
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Minneapolis, MN, United States
| | - M. E. McCue
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
46
|
O'Grady CJ, Dhandapani V, Colbourne JK, Frisch D. Refining the evolutionary time machine: An assessment of whole genome amplification using single historical Daphnia eggs. Mol Ecol Resour 2021; 22:946-961. [PMID: 34672105 DOI: 10.1111/1755-0998.13524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 09/03/2021] [Accepted: 09/07/2021] [Indexed: 12/14/2022]
Abstract
Whole genome sequencing is instrumental for the study of genome variation in natural populations, delivering important knowledge on genomic modifications and potential targets of natural selection at the population level. Large dormant eggbanks of aquatic invertebrates such as the keystone herbivore Daphnia, a microcrustacean widespread in freshwater ecosystems, provide detailed sedimentary archives to study genomic processes over centuries. To overcome the problem of limited DNA amounts in single Daphnia dormant eggs, we developed an optimized workflow for whole genome amplification (WGA), yielding sufficient amounts of DNA for downstream whole genome sequencing of individual historical eggs, including polyploid lineages. We compare two WGA kits, applied to recently produced Daphnia magna dormant eggs from laboratory cultures, and to historical dormant eggs of Daphnia pulicaria collected from Arctic lake sediment between 10 and 300 years old. Resulting genome coverage breadth in most samples was ~70%, including those from >100-year-old isolates. Sequence read distribution was highly correlated among samples amplified with the same kit, but less correlated between kits. Despite this, a high percentage of genomic positions with single nucleotide polymorphisms in one or more samples (maximum of 74% between kits, and 97% within kits) were recovered at a depth required for genotyping. As a by-product of sequencing we obtained 100% coverage of the mitochondrial genomes even from the oldest isolates (~300 years). The mitochondrial DNA provides an additional source for evolutionary studies of these populations. We provide an optimized workflow for WGA followed by whole genome sequencing including steps to minimize exogenous DNA.
Collapse
Affiliation(s)
- Christopher James O'Grady
- School of Life Sciences, University of Warwick, Coventry, UK.,Cell and Gene Therapy Catapult, London, UK.,School of Biosciences, University of Birmingham, Birmingham, UK
| | | | | | - Dagmar Frisch
- School of Biosciences, University of Birmingham, Birmingham, UK.,Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany
| |
Collapse
|
47
|
Patil AB, Vijay N. Repetitive genomic regions and the inference of demographic history. Heredity (Edinb) 2021; 127:151-166. [PMID: 34002046 PMCID: PMC8322061 DOI: 10.1038/s41437-021-00443-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/16/2021] [Accepted: 04/17/2021] [Indexed: 02/03/2023] Open
Abstract
Inference of demographic histories using whole-genome datasets has provided insights into diversification, adaptation, hybridization, and plant-pathogen interactions, and stimulated debate on the impact of anthropogenic interventions and past climate on species demography. However, the impact of repetitive genomic regions on these inferences has mostly been ignored by masking of repeats. We use the Populus trichocarpa genome (Pop_tri_v3) to show that masking of repeat regions leads to lower estimates of effective population size (Ne) in the distant past in contrast to an increase in Ne estimates in recent times. However, in human datasets, masking of repeats resulted in lower estimates of Ne at all time points. We demonstrate that repeats affect demographic inferences using diverse methods like PSMC, MSMC, SMC++, and the Stairway plot. Our genomic analysis revealed that the biases in Ne estimates were dependent on the repeat class type and its abundance in each atomic interval. Notably, we observed a weak, yet consistently significant negative correlation between the repeat abundance of an atomic interval and the Ne estimates for that interval, which potentially reflects the recombination rate variation within the genome. The rationale for the masking of repeats has been that variants identified within these regions are erroneous. We find that polymorphisms in some repeat classes occur in callable regions and reflect reliable coalescence histories (e.g., LTR Gypsy, LTR Copia). The current demography inference methods do not handle repeats explicitly, and hence the effect of individual repeat classes needs careful consideration in comparative analysis. Deciphering the repeat demographic histories might provide a clear understanding of the processes involved in repeat accumulation.
Collapse
Affiliation(s)
- Ajinkya Bharatraj Patil
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India.
| |
Collapse
|
48
|
The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life (Basel) 2021; 11:life11080773. [PMID: 34440517 PMCID: PMC8398314 DOI: 10.3390/life11080773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 12/12/2022] Open
Abstract
The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.
Collapse
|
49
|
Phylogenetic analysis of mutational robustness based on codon usage supports that the standard genetic code does not prefer extreme environments. Sci Rep 2021; 11:10963. [PMID: 34040064 PMCID: PMC8154912 DOI: 10.1038/s41598-021-90440-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/10/2021] [Indexed: 02/04/2023] Open
Abstract
The mutational robustness of the genetic code is rarely discussed in the context of biological diversity, such as codon usage and related factors, often considered as independent of the actual organism's proteome. Here we put the living beings back to picture and use distortion as a metric of mutational robustness. Distortion estimates the expected severities of non-synonymous mutations measuring it by amino acid physicochemical properties and weighting for codon usage. Using the biological variance of codon frequencies, we interpret the mutational robustness of the standard genetic code with regards to their corresponding environments and genomic compositions (GC-content). Employing phylogenetic analyses, we show that coding fidelity in physicochemical properties can deteriorate with codon usages adapted to extreme environments and these putative effects are not the artefacts of phylogenetic bias. High temperature environments select for codon usages with decreased mutational robustness of hydrophobic, volumetric, and isoelectric properties. Selection at high saline concentrations also leads to reduced fidelity in polar and isoelectric patterns. These show that the genetic code performs best with mesophilic codon usages, strengthening the view that LUCA or its ancestors preferred lower temperature environments. Taxonomic implications, such as rooting the tree of life, are also discussed.
Collapse
|
50
|
An H, Lee HY, Shim D, Choi SH, Cho H, Hyun TK, Jo IH, Chung JW. Development of CAPS Markers for Evaluation of Genetic Diversity and Population Structure in the Germplasm of Button Mushroom ( Agaricus bisporus). J Fungi (Basel) 2021; 7:375. [PMID: 34064696 PMCID: PMC8151297 DOI: 10.3390/jof7050375] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/04/2021] [Accepted: 05/10/2021] [Indexed: 01/24/2023] Open
Abstract
Agaricus bisporus is a globally cultivated mushroom with high economic value. Despite its widespread cultivation, commercial button mushroom strains have little genetic diversity and discrimination of strains for identification and breeding purposes is challenging. Molecular markers suitable for diversity analyses of germplasms with similar genotypes and discrimination between accessions are needed to support the development of new varieties. To develop cleaved amplified polymorphic sequences (CAPs) markers, single nucleotide polymorphism (SNP) mining was performed based on the A. bisporus genome and resequencing data. A total of 70 sets of CAPs markers were developed and applied to 41 A. bisporus accessions for diversity, multivariate, and population structure analyses. Of the 70 SNPs, 62.85% (44/70) were transitions (G/A or C/T) and 37.15% (26/70) were transversions (A/C, A/T, C/G, or G/T). The number of alleles per locus was 1 or 2 (average = 1.9), and expected heterozygosity and gene diversity were 0.0-0.499 (mean = 0.265) and 0.0-0.9367 (mean = 0.3599), respectively. Multivariate and cluster analyses of accessions produced similar groups, with F-statistic values of 0.134 and 0.153 for distance-based and model-based groups, respectively. A minimum set of 10 markers optimized for accession identification were selected based on high index of genetic diversity (GD, range 0.299-0.499) and major allele frequency (MAF, range 0.524-0.817). The CAPS markers can be used to evaluate genetic diversity and population structure and will facilitate the management of emerging genetic resources.
Collapse
Affiliation(s)
- Hyejin An
- Department of Industrial Plant Science and Technology, Chungbuk National University, Cheongju 28644, Korea; (H.A.); (H.C.); (T.K.H.)
| | - Hwa-Yong Lee
- Department of Forest Science, Chungbuk National University, Cheongju 28644, Korea;
| | - Donghwan Shim
- Department of Biological Science, Chungnam National University, Daejeon 34134, Korea;
| | - Seong Ho Choi
- Department of Animal Science, Chungbuk National University, Cheongju 28644, Korea;
| | - Hyunwoo Cho
- Department of Industrial Plant Science and Technology, Chungbuk National University, Cheongju 28644, Korea; (H.A.); (H.C.); (T.K.H.)
| | - Tae Kyung Hyun
- Department of Industrial Plant Science and Technology, Chungbuk National University, Cheongju 28644, Korea; (H.A.); (H.C.); (T.K.H.)
| | - Ick-Hyun Jo
- National Institute of Horticultural and Herbal Science, RDA, Eumseong 27709, Korea
| | - Jong-Wook Chung
- Department of Industrial Plant Science and Technology, Chungbuk National University, Cheongju 28644, Korea; (H.A.); (H.C.); (T.K.H.)
| |
Collapse
|