1
|
Wei Y, Li X, Zhu Q, Shan T, Wang H, Dai X, Wang Y, Zhang J. Are microhaplotypes derived from the 1000 Genomes Project reliable for forensic purposes? Forensic Sci Int Genet 2025; 78:103273. [PMID: 40106853 DOI: 10.1016/j.fsigen.2025.103273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 02/25/2025] [Accepted: 03/12/2025] [Indexed: 03/22/2025]
Abstract
Microhaplotypes (MHs) have emerged as an important genetic marker in forensic genetics. However, most studies have overlooked the potential for phasing errors within microhaplotypes based on the 1000 Genome Project (1kGP), which may impact the evaluation of various forensic parameters and lead to misleading results. In this study, we constructed a dense and extensive set of MHs across the human genome, using the expanded 1000 Genomes Project data aligned to GRCh38 reference genome. We applied three different SNP minor allele frequency (MAF) thresholds (0, 0.01, and 0.05) to evaluate the reliability of these markers. Utilizing pedigree data from 18 populations, which included a total of 602 trios, we scanned for and confirmed suspected phasing error events at these MH loci. We also sequenced 50 MHs for one trio of the Southern Han Chinese (CHS) population to further investigate these discrepancies. The results revealed the presence of phasing errors in MHs from 1kGP when analyzed using targeted enrichment and next-generation sequencing. The probability of suspected phasing error events was strongly and positively correlated with the effective number of alleles (Ae) and informativeness (In) of the markers. Additionally, these mismatch probabilities varied significantly across different continental populations. Additionally, when selecting loci, applying MAF filtering and avoiding regions such as the MHC can reduce the occurrence of such events to some extent. Based on these findings, we suggest that relying solely on sequencing data of the 1kGP for forensic purpose may be risky. A thorough investigation of the true forensic parameters of MHs is essential to ensure their reliability in forensic applications.
Collapse
Affiliation(s)
- Yifan Wei
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Xi Li
- Jiaozuo Health Commission, Jiaozuo 454000, China
| | - Qiang Zhu
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Tiantian Shan
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Haoyu Wang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Xuan Dai
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Yufang Wang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| | - Ji Zhang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
4
|
Liu Q, Song Q, Yu Y, Shi Y, Lu M, Chen Y, Tan L. Whole chloroplast genome sequence and phylogenetic analysis of Calanthe discolor (Orchidaceae). Mitochondrial DNA B Resour 2024; 9:1345-1349. [PMID: 39377034 PMCID: PMC11457340 DOI: 10.1080/23802359.2024.2411376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 09/24/2024] [Indexed: 10/09/2024] Open
Abstract
The orchid Calanthe discolor, which has high ornamental and medicinal value, is mainly distributed in Zhejiang, Jiangsu, and southeast Hubei Provinces of China, as well as in Japan and the southern Korean peninsula. In this study, the whole chloroplast genome sequence of C. discolor was first assembled using high-throughput Illumina paired-end technology, providing data to evaluate the evolution of this species. The C. discolor chloroplast genome was158,286 bp long, including a large single-copy region of 87,095 bp, a small single-copy region of 18,407 bp, and two copies of a repeat region (26,392-bp each). The overall G + C content was 41.2%. A total of 133 genes were predicted from the genome, including 87 protein-coding genes, eight ribosomal RNAs, 38 transfer RNAs. Phylogenetic analysis indicated a close relationship between C. discolor and C. bicolor.
Collapse
Affiliation(s)
- Qiuping Liu
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Qin Song
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Yan Yu
- Guizhou Tobacco Company, Duyun, China
| | - Yiming Shi
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Minghui Lu
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Yan Chen
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Leitao Tan
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| |
Collapse
|
5
|
Whiting JR, Booker TR, Rougeux C, Lind BM, Singh P, Lu M, Huang K, Whitlock MC, Aitken SN, Andrew RL, Borevitz JO, Bruhl JJ, Collins TL, Fischer MC, Hodgins KA, Holliday JA, Ingvarsson PK, Janes JK, Khandaker M, Koenig D, Kreiner JM, Kremer A, Lascoux M, Leroy T, Milesi P, Murray KD, Pyhäjärvi T, Rellstab C, Rieseberg LH, Roux F, Stinchcombe JR, Telford IRH, Todesco M, Tyrmi JS, Wang B, Weigel D, Willi Y, Wright SI, Zhou L, Yeaman S. The genetic architecture of repeated local adaptation to climate in distantly related plants. Nat Ecol Evol 2024; 8:1933-1947. [PMID: 39187610 PMCID: PMC11461274 DOI: 10.1038/s41559-024-02514-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 07/22/2024] [Indexed: 08/28/2024]
Abstract
Closely related species often use the same genes to adapt to similar environments. However, we know little about why such genes possess increased adaptive potential and whether this is conserved across deeper evolutionary lineages. Adaptation to climate presents a natural laboratory to test these ideas, as even distantly related species must contend with similar stresses. Here, we re-analyse genomic data from thousands of individuals from 25 plant species as diverged as lodgepole pine and Arabidopsis (~300 Myr). We test for genetic repeatability based on within-species associations between allele frequencies in genes and variation in 21 climate variables. Our results demonstrate significant statistical evidence for genetic repeatability across deep time that is not expected under randomness, identifying a suite of 108 gene families (orthogroups) and gene functions that repeatedly drive local adaptation to climate. This set includes many orthogroups with well-known functions in abiotic stress response. Using gene co-expression networks to quantify pleiotropy, we find that orthogroups with stronger evidence for repeatability exhibit greater network centrality and broader expression across tissues (higher pleiotropy), contrary to the 'cost of complexity' theory. These gene families may be important in helping wild and crop species cope with future climate change, representing important candidates for future study.
Collapse
Affiliation(s)
- James R Whiting
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada.
| | - Tom R Booker
- Department of Zoology, Faculty of Science, University of British Columbia, Vancouver, British Colombia, Canada
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Clément Rougeux
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Brandon M Lind
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Pooja Singh
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
- Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- EAWAG, Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
| | - Mengmeng Lu
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA
| | - Kaichi Huang
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Michael C Whitlock
- Department of Zoology, Faculty of Science, University of British Columbia, Vancouver, British Colombia, Canada
| | - Sally N Aitken
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rose L Andrew
- School of Environmental and Rural Science, University of New England, Armidale, New South Wales, Australia
| | - Justin O Borevitz
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Jeremy J Bruhl
- School of Environmental and Rural Science, University of New England, Armidale, New South Wales, Australia
| | - Timothy L Collins
- Department of Planning and Environment, Queanbeyan, New South Wales, Australia
- Department of Climate Change, Energy, the Environment and Water, Queanbeyan, New South Wales, Australia
| | - Martin C Fischer
- ETH Zurich: Institute of Integrative Biology (IBZ), ETH Zurich, Zurich, Switzerland
| | - Kathryn A Hodgins
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Jason A Holliday
- Department of Forest Resources and Environmental Conservation, Virginia Tech, Blacksburg, VA, USA
| | - Pär K Ingvarsson
- Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jasmine K Janes
- Biology Department, Vancouver Island University, Nanaimo, British Columbia, Canada
- Department of Ecosystem Science and Management, University of Northern British Columbia, Prince George, British Columbia, Canada
- Species Survival Commission, Orchid Specialist Group, IUCN North America, Washington, DC, USA
| | - Momena Khandaker
- School of Environmental and Rural Science, University of New England, Armidale, New South Wales, Australia
| | - Daniel Koenig
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
- Institute for Integrative Genome Biology, University of California, Riverside, CA, USA
| | - Julia M Kreiner
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Antoine Kremer
- UMR BIOGECO, INRAE, Université de Bordeaux; 69 Route d'Arcachon, Cestas, France
| | - Martin Lascoux
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Thibault Leroy
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | - Pascal Milesi
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Kevin D Murray
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Tanja Pyhäjärvi
- Department of Forest Sciences, University of Helsinki, Helsinki, Finland
- Viikki Plant Science Centre, University of Helsinki, Helsinki, Finland
| | | | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Fabrice Roux
- Laboratoire des Interactions Plantes-Microbes-Environnement, Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, CNRS, Université de Toulouse, Castanet-Tolosan, France
| | - John R Stinchcombe
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Ian R H Telford
- School of Environmental and Rural Science, University of New England, Armidale, New South Wales, Australia
| | - Marco Todesco
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Biology, University of British Columbia, Kelowna, British Columbia, Canada
| | - Jaakko S Tyrmi
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
| | - Baosheng Wang
- South China National Botanical Garden, Guangzhou, China
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Yvonne Willi
- Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Stephen I Wright
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Lecong Zhou
- Department of Forest Resources and Environmental Conservation, Virginia Tech, Blacksburg, VA, USA
| | - Sam Yeaman
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada.
| |
Collapse
|
6
|
Kotlarz K, Mielczarek M, Biecek P, Guldbrandtsen B, Szyda J. Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach. NAR Genom Bioinform 2024; 6:lqae131. [PMID: 39318508 PMCID: PMC11420682 DOI: 10.1093/nargab/lqae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 08/23/2024] [Accepted: 09/06/2024] [Indexed: 09/26/2024] Open
Abstract
A critical step in the analysis of whole genome sequencing data is variant calling. Despite its importance, variant calling is prone to errors. Our study investigated the association between incorrect single nucleotide polymorphism (SNP) calls and variant quality metrics and nucleotide context. In our study, incorrect SNPs were defined in 20 Holstein-Friesian cows by comparing their SNPs genotypes identified by whole genome sequencing with the IlluminaNovaSeq6000 and the EuroGMD50K genotyping microarray. The dataset was divided into the correct SNP set (666 333 SNPs) and the incorrect SNP set (4 557 SNPs). The training dataset consisted of only the correct SNPs, while the test dataset contained a balanced mix of all the incorrectly and correctly called SNPs. An autoencoder was constructed to identify systematically incorrect SNPs that were marked as outliers by a one-class support vector machine and isolation forest algorithms. The results showed that 59.53% (±0.39%) of the incorrect SNPs had systematic patterns, with the remainder being random errors. The frequent occurrence of the CGC 3-mer was due to mislabelling a call for C. Incorrect T instead of A call was associated with the presence of T in the neighbouring downstream position. These errors may arise due to the fluorescence patterns of nucleotide labelling.
Collapse
Affiliation(s)
- Krzysztof Kotlarz
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| | - Magda Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| | - Przemysław Biecek
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw 00-662, Poland
- Institute of Informatics, University of Warsaw, Warsaw 02-097, Poland
| | - Bernt Guldbrandtsen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg C 1870, Denmark
| | - Joanna Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| |
Collapse
|
7
|
Jambulingam D, Rathinakannan VS, Heron S, Schleutker J, Fey V. Kuura-An automated workflow for analyzing WES and WGS data. PLoS One 2024; 19:e0296785. [PMID: 38236904 PMCID: PMC10796025 DOI: 10.1371/journal.pone.0296785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 12/19/2023] [Indexed: 01/22/2024] Open
Abstract
The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose "Kuura-An automated workflow for analyzing WES and WGS data", which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages-quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline.
Collapse
Affiliation(s)
- Dhanaprakash Jambulingam
- Institute of Biomedicine, Cancer Research Unit and FICAN West Cancer Centre, University of Turku and Turku University Hospital, Turku, Finland
| | - Venkat Subramaniam Rathinakannan
- Institute of Biomedicine, Cancer Research Unit and FICAN West Cancer Centre, University of Turku and Turku University Hospital, Turku, Finland
| | - Samuel Heron
- Institute of Biomedicine, Cancer Research Unit and FICAN West Cancer Centre, University of Turku and Turku University Hospital, Turku, Finland
| | - Johanna Schleutker
- Institute of Biomedicine, Cancer Research Unit and FICAN West Cancer Centre, University of Turku and Turku University Hospital, Turku, Finland
- Department of Genomics, Laboratory Division, Turku University Hospital, Turku, Finland
| | - Vidal Fey
- Institute of Biomedicine, Cancer Research Unit and FICAN West Cancer Centre, University of Turku and Turku University Hospital, Turku, Finland
- Faculty of Medicine and Health Technology/BioMediTech, Tampere University, Tampere, Finland
| |
Collapse
|