1
|
Nixon A, Fang L, Havrilla JM, Wang K. Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation. Chem Biodivers 2022; 19:e202200805. [PMID: 36328766 DOI: 10.1002/cbdv.202200805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022]
Abstract
Clinical notes from electronic health records (EHRs) contain a large amount of clinical phenotype data on patients that can provide insights into the phenotypic presentation of various diseases. A number of Natural Language Processing (NLP) algorithms have been utilized in the past few years to annotate medical concepts, such as Human Phenotype Ontology (HPO) terms, from clinical notes. However, efficient use of NLP algorithms requires the use of high-quality clinical notes with phenotype descriptions, and erroneous annotations often exist in results from these NLP algorithms. Manual review by human experts is often needed to compile the correct phenotype information on individual patients. Here we develop TermViewer, a web application that allows multi-party collaborative annotation and quality assessment of clinical notes that have already been processed and tagged by NLP algorithms. TermViewer allows users to view clinical notes with HPO terms highlighted, and to easily classify high-quality notes and revise incorrect tagging of HPO terms. Currently, TermViewer combines MetaMap and cTAKES, two of the most widely used NLP tools for tagging medical terms, and identifies where these two tools agree and disagree, allowing users to perform collaborative manual reviews of computationally generated HPO annotations. TermViewer can be a stand-alone tool for analyzing notes or become part of a machine-learning pipeline where tagged HPO terms can be used as additional input data. TermViewer is available at https://github.com/WGLab/TermViewer.
Collapse
Affiliation(s)
- Anna Nixon
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Havrilla JM, Singaravelu A, Driscoll DM, Minkovsky L, Helbig I, Medne L, Wang K, Krantz I, Desai BR. PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med Inform Decis Mak 2022; 22:198. [PMID: 35902925 PMCID: PMC9335954 DOI: 10.1186/s12911-022-01927-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 07/06/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Clinical phenotype information greatly facilitates genetic diagnostic interpretations pipelines in disease. While post-hoc extraction using natural language processing on unstructured clinical notes continues to improve, there is a need to improve point-of-care collection of patient phenotypes. Therefore, we developed "PheNominal", a point-of-care web application, embedded within Epic electronic health record (EHR) workflows, to permit capture of standardized phenotype data. METHODS Using bi-directional web services available within commercial EHRs, we developed a lightweight web application that allows users to rapidly browse and identify relevant terms from the Human Phenotype Ontology (HPO). Selected terms are saved discretely within the patient's EHR, permitting reuse both in clinical notes as well as in downstream diagnostic and research pipelines. RESULTS In the 16 months since implementation, PheNominal was used to capture discrete phenotype data for over 1500 individuals and 11,000 HPO terms during clinic and inpatient encounters for a genetic diagnostic consultation service within a quaternary-care pediatric academic medical center. An average of 7 HPO terms were captured per patient. Compared to a manual workflow, the average time to enter terms for a patient was reduced from 15 to 5 min per patient, and there were fewer annotation errors. CONCLUSIONS Modern EHRs support integration of external applications using application programming interfaces. We describe a practical application of these interfaces to facilitate deep phenotype capture in a discrete, structured format within a busy clinical workflow. Future versions will include a vendor-agnostic implementation using FHIR. We describe pilot efforts to integrate structured phenotyping through controlled dictionaries into diagnostic and research pipelines, reducing manual effort for phenotype documentation and reducing errors in data entry.
Collapse
Affiliation(s)
- James M. Havrilla
- grid.239552.a0000 0001 0680 8770Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Anbumalar Singaravelu
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Dennis M. Driscoll
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Leonard Minkovsky
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Ingo Helbig
- grid.239552.a0000 0001 0680 8770Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.239552.a0000 0001 0680 8770The Epilepsy NeuroGenetics Initiative (ENGIN), Children’s Hospital of Philadelphia, Philadelphia, USA ,grid.239552.a0000 0001 0680 8770Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Livija Medne
- grid.239552.a0000 0001 0680 8770Roberts Individualized Medical Genetics Center, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Kai Wang
- grid.239552.a0000 0001 0680 8770Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.239552.a0000 0001 0680 8770Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Ian Krantz
- grid.239552.a0000 0001 0680 8770Roberts Individualized Medical Genetics Center, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Bimal R. Desai
- grid.25879.310000 0004 1936 8972Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| |
Collapse
|
3
|
Havrilla JM, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med 2021; 13:91. [PMID: 34034817 PMCID: PMC8147460 DOI: 10.1186/s13073-021-00909-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/13/2021] [Indexed: 02/07/2023] Open
Abstract
We present PhenCards ( https://phencards.org ), a database and web server intended as a one-stop shop for previously disconnected biomedical knowledge related to human clinical phenotypes. Users can query human phenotype terms or clinical notes. PhenCards obtains relevant disease/phenotype prevalence and co-occurrence, drug, procedural, pathway, literature, grant, and collaborator data. PhenCards recommends the most probable genetic diseases and candidate genes based on phenotype terms from clinical notes. PhenCards facilitates exploration of phenotype, e.g., which drugs cause or are prescribed for patient symptoms, which genes likely cause specific symptoms, and which comorbidities co-occur with phenotypes.
Collapse
Affiliation(s)
- James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Xiangchen Dong
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA.
| |
Collapse
|
4
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Zhao M, Havrilla JM, Fang L, Chen Y, Peng J, Liu C, Wu C, Sarmady M, Botas P, Isla J, Lyon GJ, Weng C, Wang K. Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom Bioinform 2020; 2:lqaa032. [PMID: 32500119 PMCID: PMC7252576 DOI: 10.1093/nargab/lqaa032] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 04/10/2020] [Accepted: 04/28/2020] [Indexed: 02/07/2023] Open
Abstract
Human Phenotype Ontology (HPO) terms are increasingly used in diagnostic settings to aid in the characterization of patient phenotypes. The HPO annotation database is updated frequently and can provide detailed phenotype knowledge on various human diseases, and many HPO terms are now mapped to candidate causal genes with binary relationships. To further improve the genetic diagnosis of rare diseases, we incorporated these HPO annotations, gene-disease databases and gene-gene databases in a probabilistic model to build a novel HPO-driven gene prioritization tool, Phen2Gene. Phen2Gene accesses a database built upon this information called the HPO2Gene Knowledgebase (H2GKB), which provides weighted and ranked gene lists for every HPO term. Phen2Gene is then able to access the H2GKB for patient-specific lists of HPO terms or PhenoPacket descriptions supported by GA4GH (http://phenopackets.org/), calculate a prioritized gene list based on a probabilistic model and output gene-disease relationships with great accuracy. Phen2Gene outperforms existing gene prioritization tools in speed and acts as a real-time phenotype-driven gene prioritization tool to aid the clinical diagnosis of rare undiagnosed diseases. In addition to a command line tool released under the MIT license (https://github.com/WGLab/Phen2Gene), we also developed a web server and web service (https://phen2gene.wglab.org/) for running the tool via web interface or RESTful API queries. Finally, we have curated a large amount of benchmarking data for phenotype-to-gene tools involving 197 patients across 76 scientific articles and 85 patients' de-identified HPO term data from the Children's Hospital of Philadelphia.
Collapse
Affiliation(s)
- Mengge Zhao
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - James M Havrilla
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Li Fang
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Ying Chen
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jacqueline Peng
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY 10032, USA
| | - Chao Wu
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Pablo Botas
- Foundation 29, Pozuelo de Alarcon, 28223 Madrid, Spain
| | - Julián Isla
- Foundation 29, Pozuelo de Alarcon, 28223 Madrid, Spain.,Dravet Syndrome European Federation, 29200 Brest, France
| | - Gholson J Lyon
- Institute for Basic Research in Developmental Disabilities (IBR), Staten Island, NY 10314, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY 10032, USA
| | - Kai Wang
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Boukas L, Havrilla JM, Hickey PF, Quinlan AR, Bjornsson HT, Hansen KD. Coexpression patterns define epigenetic regulators associated with neurological dysfunction. Genome Res 2019; 29:532-542. [PMID: 30858344 PMCID: PMC6442390 DOI: 10.1101/gr.239442.118] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 02/07/2019] [Indexed: 01/12/2023]
Abstract
Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.
Collapse
Affiliation(s)
- Leandros Boukas
- Human Genetics Training Program, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - James M Havrilla
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Peter F Hickey
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84108, USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84108, USA
| | - Hans T Bjornsson
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA
- Faculty of Medicine, University of Iceland, 101 Reykjavík, Iceland
- Landspitali University Hospital, 101 Reykjavík, Iceland
| | - Kasper D Hansen
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| |
Collapse
|
8
|
Havrilla JM, Pedersen BS, Layer RM, Quinlan AR. A map of constrained coding regions in the human genome. Nat Genet 2018; 51:88-95. [PMID: 30531870 DOI: 10.1038/s41588-018-0294-6] [Citation(s) in RCA: 143] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 10/29/2018] [Indexed: 12/13/2022]
Abstract
Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.
Collapse
Affiliation(s)
- James M Havrilla
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.,Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA. .,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA. .,Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
9
|
Belyeu JR, Nicholas TJ, Pedersen BS, Sasani TA, Havrilla JM, Kravitz SN, Conway ME, Lohman BK, Quinlan AR, Layer RM. SV-plaudit: A cloud-based framework for manually curating thousands of structural variants. Gigascience 2018; 7:5026174. [PMID: 29860504 PMCID: PMC6030999 DOI: 10.1093/gigascience/giy064] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 05/25/2018] [Indexed: 01/21/2023] Open
Abstract
SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - James M Havrilla
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Stephanie N Kravitz
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Megan E Conway
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA
| | - Brian K Lohman
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.,Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|