1
|
Guardado M, Perez C, Campana S, Chavez Rojas B, Magaña J, Jackson S, Samperio E, Hernandez S, Syas K, Hernandez RD, Zavala EI, Rohlfs RV. py_ped_sim: a flexible forward pedigree and genetic simulator for complex family pedigree analysis. BMC Bioinformatics 2025; 26:122. [PMID: 40335952 PMCID: PMC12060417 DOI: 10.1186/s12859-025-06142-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 04/14/2025] [Indexed: 05/09/2025] Open
Abstract
BACKGROUND Large-scale family pedigrees are commonly used across medical, evolutionary, and forensic genetics. These pedigrees are tools for identifying genetic disorders, tracking evolutionary patterns, and establishing familial relationships via forensic genetic identification. However, there is a lack of software to accurately simulate different pedigree structures along with genomes corresponding to those individuals in a family pedigree. This limits simulation-based evaluations of methods that use pedigrees. RESULTS We have developed a python command-line-based tool called py_ped_sim that facilitates the simulation of pedigree structures and the genomes of individuals in a pedigree. py_ped_sim represents pedigrees as directed acyclic graphs, enabling conversion between standard pedigree formats and integration with the forward population genetic simulator, SLiM. Notably, py_ped_sim allows the simulation of varying numbers of offspring for a set of parents, with the capacity to shift the distribution of sibship sizes over generations. We additionally add simulations for events of misattributed paternity, which offers a way to simulate half-sibling relationships, and simulations to extend the breadth of a family pedigree. We validated the accuracy of both our genome simulator and pedigree simulator. We show that we can simulate genomes onto family pedigrees with levels of expected kinship. CONCLUSIONS py_ped_sim is a user-friendly and open-source solution for simulating pedigree structures and conducting pedigree genome simulations. It empowers medical, forensic, and evolutionary genetics researchers to gain deeper insights into the dynamics of genetic inheritance and relatedness within families.
Collapse
Affiliation(s)
- Miguel Guardado
- Department of Mathematics, San Francisco State University, San Francisco, CA, 94132, USA.
- Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, CA, 94158, USA.
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94134, USA.
- Department of Data Science, University of Oregon, Eugene, OR, 97403, USA.
| | - Cynthia Perez
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Sthen Campana
- Department of Data Science, University of Oregon, Eugene, OR, 97403, USA
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Berenice Chavez Rojas
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Joaquín Magaña
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Shalom Jackson
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Emily Samperio
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Selena Hernandez
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
| | - Kaela Syas
- Department of Mathematics, San Francisco State University, San Francisco, CA, 94132, USA
| | - Ryan D Hernandez
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94134, USA
| | - Elena I Zavala
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Rori V Rohlfs
- Department of Data Science, University of Oregon, Eugene, OR, 97403, USA.
- Department of Biology, San Francisco State University, San Francisco, CA, 94132, USA.
| |
Collapse
|
2
|
Nieuwoudt C, Farooq FB, Brooks-Wilson A, Bureau A, Graham J. Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes. Genet Epidemiol 2024; 48:324-343. [PMID: 38940260 DOI: 10.1002/gepi.22579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 03/26/2024] [Accepted: 06/13/2024] [Indexed: 06/29/2024]
Abstract
Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.
Collapse
Affiliation(s)
- Christina Nieuwoudt
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Fabiha Binte Farooq
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Angela Brooks-Wilson
- Department of Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Alexandre Bureau
- Département de Médecine Sociale et Préventive, Université Laval, Québec City, Québec, Canada
- Centre de recherche CERVO, Québec City, Québec, Canada
| | - Jinko Graham
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
3
|
Guardado M, Perez C, Jackson S, Magaña J, Campana S, Samperio E, Rojas BC, Hernandez S, Syas K, Hernandez R, Zavala EI, Rohlfs R. py_ped_sim - A flexible forward genetic simulator for complex family pedigree analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.25.586501. [PMID: 38585824 PMCID: PMC10996500 DOI: 10.1101/2024.03.25.586501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Background Large-scale family pedigrees are commonly used across medical, evolutionary, and forensic genetics. These pedigrees are tools for identifying genetic disorders, tracking evolutionary patterns, and establishing familial relationships via forensic genetic identification. However, there is a lack of software to accurately simulate different pedigree structures along with genomes corresponding to those individuals in a family pedigree. This limits simulation-based evaluations of methods that use pedigrees. Results We have developed a python command-line-based tool called py_ped_sim that facilitates the simulation of pedigree structures and the genomes of individuals in a pedigree. py_ped_sim represents pedigrees as directed acyclic graphs, enabling conversion between standard pedigree formats and integration with the forward population genetic simulator, SLiM. Notably, py_ped_sim allows the simulation of varying numbers of offspring for a set of parents, with the capacity to shift the distribution of sibship sizes over generations. We additionally add simulations for events of misattributed paternity, which offers a way to simulate half-sibling relationships. We validated the accuracy of our software by simulating genomes onto diverse family pedigree structures, showing that the estimated kinship coefficients closely approximated expected values. Conclusions py_ped_sim is a user-friendly and open-source solution for simulating pedigree structures and conducting pedigree genome simulations. It empowers medical, forensic, and evolutionary genetics researchers to gain deeper insights into the dynamics of genetic inheritance and relatedness within families.
Collapse
Affiliation(s)
- Miguel Guardado
- San Francisco State University, Department of Mathematics, San Francisco CA, 94132, USA
- University of California San Francisco, Biological and Medical Informatics Graduate Program. San Francisco CA, 94158
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; San Francisco, 94134, CA, USA
- University of Oregon; Department of Data Science; Eugene, OR, 97403, USA
| | - Cynthia Perez
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Shalom Jackson
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Joaquín Magaña
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Sthen Campana
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Emily Samperio
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | | | - Selena Hernandez
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Kaela Syas
- San Francisco State University, Department of Mathematics, San Francisco CA, 94132, USA
| | - Ryan Hernandez
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; San Francisco, 94134, CA, USA
| | - Elena I. Zavala
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
- University of California, Berkeley, Department of Molecular and Cell Biology, Berkeley, CA, 94720, USA
| | - Rori Rohlfs
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
- University of Oregon; Department of Data Science; Eugene, OR, 97403, USA
| |
Collapse
|