1
|
Nolen ZJ. PopGLen-a Snakemake pipeline for performing population genomic analyses using genotype likelihood-based methods. Bioinformatics 2025; 41:btaf105. [PMID: 40067089 PMCID: PMC11932725 DOI: 10.1093/bioinformatics/btaf105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 02/11/2025] [Accepted: 03/06/2025] [Indexed: 03/26/2025] Open
Abstract
SUMMARY PopGLen is a Snakemake workflow for performing population genomic analyses within a genotype-likelihood framework, integrating steps for raw sequence processing of both historical and modern DNA, quality control, multiple filtering schemes, and population genomic analysis. Currently, the population genomic analyses included allow for estimating linkage disequilibrium, kinship, genetic diversity, genetic differentiation, population structure, inbreeding, and allele frequencies. Through Snakemake, it is highly scalable, and all steps of the workflow are automated, with results compiled into an HTML report. PopGLen provides an efficient, customizable, and reproducible option for analyzing population genomic datasets across a wide variety of organisms. AVAILABILITY AND IMPLEMENTATION PopGLen is available under GPLv3 with code, documentation, and a tutorial at https://github.com/zjnolen/PopGLen. An example HTML report using the tutorial dataset is included in the Supplementary Material.
Collapse
|
2
|
Bozzi D, Neuenschwander S, Cruz Dávalos DI, Sousa da Mota B, Schroeder H, Moreno-Mayar JV, Allentoft ME, Malaspinas AS. Towards predicting the geographical origin of ancient samples with metagenomic data. Sci Rep 2024; 14:21794. [PMID: 39294129 PMCID: PMC11411106 DOI: 10.1038/s41598-023-40246-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 08/07/2023] [Indexed: 09/20/2024] Open
Abstract
Reconstructing the history-such as the place of birth and death-of an individual sample is a fundamental goal in ancient DNA (aDNA) studies. However, knowing the place of death can be particularly challenging when samples come from museum collections with incomplete or erroneous archives. While analyses of human DNA and isotope data can inform us about the ancestry of an individual and provide clues about where the person lived, they cannot specifically trace the place of death. Moreover, while ancient human DNA can be retrieved, a large fraction of the sequenced molecules in ancient DNA studies derive from exogenous DNA. This DNA-which is usually discarded in aDNA analyses-is constituted mostly by microbial DNA from soil-dwelling microorganisms that have colonized the buried remains post-mortem. In this study, we hypothesize that remains of individuals buried in the same or close geographic areas, exposed to similar microbial communities, could harbor more similar metagenomes. We propose to use metagenomic data from ancient samples' shotgun sequencing to locate the place of death of a given individual which can also help to solve cases of sample mislabeling. We used a k-mer-based approach to compute similarity scores between metagenomic samples from different locations and propose a method based on dimensionality reduction and logistic regression to assign a geographical origin to target samples. We apply our method to several public datasets and observe that individual samples from closer geographic locations tend to show higher similarities in their metagenomes compared to those of different origin, allowing good geographical predictions of test samples. Moreover, we observe that the genus Streptomyces commonly infiltrates ancient remains and represents a valuable biomarker to trace the samples' geographic origin. Our results provide a proof of concept and show how metagenomic data can also be used to shed light on the place of origin of ancient samples.
Collapse
Affiliation(s)
- Davide Bozzi
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Samuel Neuenschwander
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- Vital-IT, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Diana Ivette Cruz Dávalos
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Hannes Schroeder
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - J Víctor Moreno-Mayar
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, WA, Australia
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
3
|
Emery MV, Bolhofner K, Spake L, Ghafoor S, Versoza CJ, Rawls EM, Winingear S, Buikstra JE, Loreille O, Fulginiti LC, Stone AC. Targeted enrichment of whole-genome SNPs from highly burned skeletal remains. J Forensic Sci 2024; 69:1558-1577. [PMID: 38415845 DOI: 10.1111/1556-4029.15482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/13/2024] [Accepted: 01/19/2024] [Indexed: 02/29/2024]
Abstract
Genetic assessment of highly incinerated and/or degraded human skeletal material is a persistent challenge in forensic DNA analysis, including identifying victims of mass disasters. Few studies have investigated the impact of thermal degradation on whole-genome single-nucleotide polymorphism (SNP) quality and quantity using next-generation sequencing (NGS). We present whole-genome SNP data obtained from the bones and teeth of 27 fire victims using two DNA extraction techniques. Extracts were converted to double-stranded DNA libraries then enriched for whole-genome SNPs using unpublished biotinylated RNA baits and sequenced on an Illumina NextSeq 550 platform. Raw reads were processed using the EAGER (Efficient Ancient Genome Reconstruction) pipeline, and the SNPs filtered and called using FreeBayes and GATK (v. 3.8). Mixed-effects modeling of the data suggest that SNP variability and preservation is predominantly determined by skeletal element and burn category, and not by extraction type. Whole-genome SNP data suggest that selecting long bones, hand and foot bones, and teeth subjected to temperatures <350°C are the most likely sources for higher genomic DNA yields. Furthermore, we observed an inverse correlation between the number of captured SNPs and the extent to which samples were burned, as well as a significant decrease in the total number of SNPs measured for samples subjected to temperatures >350°C. Our data complement previous analyses of burned human remains that compare extraction methods for downstream forensic applications and support the idea of adopting a modified Dabney extraction technique when traditional forensic methods fail to produce DNA yields sufficient for genetic identification.
Collapse
Affiliation(s)
- Matthew V Emery
- Department of Anthropology, Binghamton University, Binghamton, New York, USA
- School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
- Center for Evolution and Medicine, Arizona State University, Life Sciences C, Tempe, Arizona, USA
| | - Katelyn Bolhofner
- Center for Bioarchaeology, Arizona State University, Tempe, Arizona, USA
- School of Interdisciplinary Forensics, Arizona State University, Glendale, Arizona, USA
| | - Laure Spake
- Department of Anthropology, Binghamton University, Binghamton, New York, USA
| | - Suhail Ghafoor
- Center for Evolution and Medicine, Arizona State University, Life Sciences C, Tempe, Arizona, USA
| | - Cyril J Versoza
- Center for Evolution and Medicine, Arizona State University, Life Sciences C, Tempe, Arizona, USA
- School of Life Sciences, Arizona State University, Life Sciences C, Tempe, Arizona, USA
| | - Erin M Rawls
- School of Life Sciences, Arizona State University, Life Sciences C, Tempe, Arizona, USA
| | - Stevie Winingear
- School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
| | - Jane E Buikstra
- Center for Evolution and Medicine, Arizona State University, Life Sciences C, Tempe, Arizona, USA
- Center for Bioarchaeology, Arizona State University, Tempe, Arizona, USA
| | - Odile Loreille
- FBI Laboratory, DNA Support Unit, Quantico, Virginia, USA
| | - Laura C Fulginiti
- School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
- Maricopa County Office of the Medical Examiner, Phoenix, Arizona, USA
| | - Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
- Center for Evolution and Medicine, Arizona State University, Life Sciences C, Tempe, Arizona, USA
- Center for Bioarchaeology, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
4
|
Psonis N, Vassou D, Nafplioti A, Tabakaki E, Pavlidis P, Stamatakis A, Poulakakis N. Identification of the 18 World War II executed citizens of Adele, Rethymnon, Crete using an ancient DNA approach and low coverage genomes. Forensic Sci Int Genet 2024; 71:103060. [PMID: 38796876 DOI: 10.1016/j.fsigen.2024.103060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/22/2024] [Accepted: 05/05/2024] [Indexed: 05/29/2024]
Abstract
In the Battle of Crete during the World War II occupation of Greece, the German forces faced substantial civilian resistance. To retribute the numerous German losses, a series of mass executions took place in numerous places in Crete; a common practice reported from Greece and elsewhere. In Adele, a village in the regional unit of Rethymnon, 18 male civilians were executed and buried in a burial pit at the Sarakina site. In this study, the first one conducted for a conflict that occurred in Greece, we identified for humanitarian purposes the 18 skulls of the Sarakina victims, following a request from the local community of Adele. The molecular identification of historical human remains via ancient DNA approaches and low coverage whole genome sequencing has only recently been introduced. Here, we performed genome skimming on the living relatives of the victims, as well as high throughput historical DNA analysis on the skulls to infer the kinship degrees among the victims via genetic relatedness analyses. We also conducted targeted anthropological analysis to successfully complete the identification of all Sarakina victims. We demonstrate that our methodological approach constitutes a potentially highly informative forensic tool to identify war victims. It can hence be applied to analogous studies on degraded DNA, thus, paving the path for systematic war victim identification in Greece and beyond.
Collapse
Affiliation(s)
- Nikolaos Psonis
- Ancient DNA Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Irakleio 70013, Greece.
| | - Despoina Vassou
- Ancient DNA Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Irakleio 70013, Greece
| | - Argyro Nafplioti
- Ancient DNA Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Irakleio 70013, Greece
| | - Eugenia Tabakaki
- Ancient DNA Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Irakleio 70013, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science (ICS), Foundation for Research and Technology-Hellas (FORTH), Irakleio 70013, Greece; Department of Biology, School of Sciences and Engineering, University of Crete, Irakleio 70013, Greece
| | - Alexandros Stamatakis
- Institute of Computer Science (ICS), Foundation for Research and Technology-Hellas (FORTH), Irakleio 70013, Greece; Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany; Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76131, Germany
| | - Nikos Poulakakis
- Ancient DNA Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Irakleio 70013, Greece; Natural History Museum of Crete, School of Sciences and Engineering, University of Crete, Irakleio 71409, Greece; Department of Biology, School of Sciences and Engineering, University of Crete, Irakleio 70013, Greece
| |
Collapse
|
5
|
Childebayeva A, Zavala EI. Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics. iScience 2023; 26:108066. [PMID: 37927550 PMCID: PMC10622734 DOI: 10.1016/j.isci.2023.108066] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Collapse
Affiliation(s)
- Ainash Childebayeva
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Anthropology, University of Kansas, Lawrence, KS, USA
| | - Elena I. Zavala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biology, University of Oregon, Eugene, OR, USA
| |
Collapse
|
6
|
Herrick N, Walsh S. ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications. BMC Bioinformatics 2023; 24:424. [PMID: 37940870 PMCID: PMC10633908 DOI: 10.1186/s12859-023-05548-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 10/27/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Processing raw genomic data for downstream applications such as imputation, association studies, and modeling requires numerous third-party bioinformatics software tools. It is highly time-consuming and resource-intensive with computational demands and storage limitations that pose significant challenges that increase cost. The use of software tools independent of one another, in a disjointed stepwise fashion, increases the difficulty and sets forth higher error rates because of fragmented job executions in alignment, variant calling, and/or build conversion complications. As sequencing data availability grows, the ability for biologists to process it using stable, automated, and reproducible workflows is paramount as it significantly reduces the time to generate clean and reliable data. RESULTS The Iliad suite of genomic data workflows was developed to provide users with seamless file transitions from raw genomic data to a quality-controlled variant call format (VCF) file for downstream applications. Iliad benefits from the efficiency of the Snakemake best practices framework coupled with Singularity and Docker containers for repeatability, portability, and ease of installation. This feat is accomplished from the onset with download acquisitions of any raw data type (FASTQ, CRAM, IDAT) straight through to the generation of a clean merged data file that can combine any user-preferred datasets using robust programs such as BWA, Samtools, and BCFtools. Users can customize and direct their workflow with one straightforward configuration file. Iliad is compatible with Linux, MacOS, and Windows platforms and scalable from a local machine to a high-performance computing cluster. CONCLUSION Iliad offers automated workflows with optimized time and resource management that are comparable to other workflows available but generates analysis-ready VCF files from the most common datatypes using a single command. The storage footprint challenge of genomic data is overcome by utilizing temporary intermediate files before the final VCF is generated. This file is ready for use in imputation, genome-wide association study (GWAS) pipelines, high-throughput population genetics studies, select gene candidate studies, and more. Iliad was developed to be portable, compatible, scalable, robust, and repeatable with a simplistic setup, so biologists that are less familiar with programming can manage their own big data with this open-source suite of workflows.
Collapse
Affiliation(s)
- Noah Herrick
- Department of Biology, Indiana University Indianapolis, 723 W. Michigan Street, Indianapolis, IN, USA.
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, 723 W. Michigan Street, Indianapolis, IN, USA
| |
Collapse
|