1
|
Alsentzer E, Finlayson SG, Li MM, Kobren SN, Kohane IS. Simulation of undiagnosed patients with novel genetic conditions. Nat Commun 2023; 14:6403. [PMID: 37828001 PMCID: PMC10570269 DOI: 10.1038/s41467-023-41980-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/26/2023] [Indexed: 10/14/2023] Open
Abstract
Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.
Collapse
Grants
- U01 HG007690 NHGRI NIH HHS
- U54 NS108251 NINDS NIH HHS
- U01 HG010219 NHGRI NIH HHS
- U01 HG007672 NHGRI NIH HHS
- U01 HG010233 NHGRI NIH HHS
- U01 HG010230 NHGRI NIH HHS
- U01 HG007943 NHGRI NIH HHS
- U01 HG010217 NHGRI NIH HHS
- U01 HG007942 NHGRI NIH HHS
- U01 HG010215 NHGRI NIH HHS
- U01 HG007708 NHGRI NIH HHS
- T32 HG002295 NHGRI NIH HHS
- T32 GM007753 NIGMS NIH HHS
- U01 HG007674 NHGRI NIH HHS
- U01 TR001395 NCATS NIH HHS
- U01 HG007709 NHGRI NIH HHS
- U54 NS093793 NINDS NIH HHS
- U01 HG007530 NHGRI NIH HHS
- U01 TR002471 NCATS NIH HHS
- U01 HG007703 NHGRI NIH HHS
- UDN research reported in this manuscript was supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number(s) U01HG007709, U01HG010219, U01HG010230, U01HG010217, U01HG010233, U01HG010215, U01HG007672, U01HG007690, U01HG007708, U01HG007703, U01HG007674, U01HG007530, U01HG007942, U01HG007943, U01TR001395, U01TR002471, U54NS108251, and U54NS093793.
- E.A. is supported by a Microsoft Research PhD Fellowship.
- S.F. is supported by award Number T32GM007753 from the National Institute of General Medical Sciences.
- M.L. is supported by T32HG002295 from the National Human Genome Research Institute and a National Science Foundation Graduate Research Fellowship.
Collapse
Affiliation(s)
- Emily Alsentzer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
| | - Samuel G Finlayson
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA, 98105, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105, USA
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Shilpa N Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
2
|
Tosco-Herrera E, Muñoz-Barrera A, Jáspez D, Rubio-Rodríguez LA, Mendoza-Alvarez A, Rodriguez-Perez H, Jou J, Iñigo-Campos A, Corrales A, Ciuffreda L, Martinez-Bugallo F, Prieto-Morin C, García-Olivares V, González-Montelongo R, Lorenzo-Salazar JM, Marcelino-Rodriguez I, Flores C. Evaluation of a whole-exome sequencing pipeline and benchmarking of causal germline variant prioritizers. Hum Mutat 2022; 43:2010-2020. [PMID: 36054330 DOI: 10.1002/humu.24459] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/20/2022] [Accepted: 08/30/2022] [Indexed: 01/25/2023]
Abstract
Most causal variants of Mendelian diseases are exonic. Whole-exome sequencing (WES) has become the diagnostic gold standard, but causative variant prioritization constitutes a bottleneck. Here we assessed an in-house sample-to-sequence pipeline and benchmarked free prioritization tools for germline causal variants from WES data. WES of 61 unselected patients with a known genetic disease cause was obtained. Variant prioritizations were performed by diverse tools and recorded to obtain a diagnostic yield when the causal variant was present in the first, fifth, and 10th top rankings. A fraction of causal variants was not captured by WES (8.2%) or did not pass the quality control criteria (13.1%). Most of the applications inspected were unavailable or had technical limitations, leaving nine tools for complete evaluation. Exomiser performed best in the top first rankings, while LIRICAL led in the top fifth rankings. Based on the more conservative top 10th rankings, Xrare had the highest diagnostic yield, followed by a three-way tie among Exomiser, LIRICAL, and PhenIX, then followed by AMELIE, TAPES, Phen-Gen, AIVar, and VarNote-PAT. Xrare, Exomiser, LIRICAL, and PhenIX are the most efficient options for variant prioritization in real patient WES data.
Collapse
Affiliation(s)
- Eva Tosco-Herrera
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Adrián Muñoz-Barrera
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Luis A Rubio-Rodríguez
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Alejandro Mendoza-Alvarez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Hector Rodriguez-Perez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Jonathan Jou
- Department of Surgery, University of Illinois College of Medicine, Peoria, Illinois, USA
| | - Antonio Iñigo-Campos
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Laura Ciuffreda
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Francisco Martinez-Bugallo
- Clinical Analysis Service, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Carol Prieto-Morin
- Clinical Analysis Service, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | | | - Jose Miguel Lorenzo-Salazar
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | | | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain.,Facultad de Ciencias de la Salud, Universidad Fernando Pessoa Canarias, Las Palmas de Gran Canaria, Spain
| |
Collapse
|