1
|
Clarke DJ, Evangelista JE, Xie Z, Marino GB, Byrd AI, Maurya MR, Srinivasan S, Yu K, Petrosyan V, Roth ME, Milinkov M, King CH, Vora JK, Keeney J, Nemarich C, Khan W, Lachmann A, Ahmed N, Agris A, Pan J, Ramachandran S, Fahy E, Esquivel E, Mihajlovic A, Jevtic B, Milinovic V, Kim S, McNeely P, Wang T, Wenger E, Brown MA, Sickler A, Zhu Y, Jenkins SL, Blood PD, Taylor DM, Resnick AC, Mazumder R, Milosavljevic A, Subramaniam S, Ma’ayan A. Playbook workflow builder: Interactive construction of bioinformatics workflows. PLoS Comput Biol 2025; 21:e1012901. [PMID: 40179105 PMCID: PMC11967941 DOI: 10.1371/journal.pcbi.1012901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Accepted: 02/24/2025] [Indexed: 04/05/2025] Open
Abstract
The Playbook Workflow Builder (PWB) is a web-based platform to dynamically construct and execute bioinformatics workflows by utilizing a growing network of input datasets, semantically annotated API endpoints, and data visualization tools contributed by an ecosystem of collaborators. Via a user-friendly user interface, workflows can be constructed from contributed building-blocks without technical expertise. The output of each step of the workflow is added into reports containing textual descriptions, figures, tables, and references. To construct workflows, users can click on cards that represent each step in a workflow, or construct workflows via a chat interface that is assisted by a large language model (LLM). Completed workflows are compatible with Common Workflow Language (CWL) and can be published as research publications, slideshows, and posters. To demonstrate how the PWB generates meaningful hypotheses that draw knowledge from across multiple resources, we present several use cases. For example, one of these use cases prioritizes drug targets for individual cancer patients using data from the NIH Common Fund programs GTEx, LINCS, Metabolomics, GlyGen, and ExRNA. The workflows created with PWB can be repurposed to tackle similar use cases using different inputs. The PWB platform is available from: https://playbook-workflow-builder.cloud/.
Collapse
Affiliation(s)
- Daniel J.B. Clarke
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - John Erol Evangelista
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Giacomo B. Marino
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Anna I. Byrd
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Mano R. Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Sumana Srinivasan
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Keyang Yu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Varduhi Petrosyan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Matthew E. Roth
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | | | - Charles Hadley King
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Jeet Kiran Vora
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Jonathon Keeney
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Christopher Nemarich
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - William Khan
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Alexander Lachmann
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Nasheath Ahmed
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Alexandra Agris
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Juncheng Pan
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Srinivasan Ramachandran
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Eoin Fahy
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Emmanuel Esquivel
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | | | - Bosko Jevtic
- Persida Inc., Brooklyn, New York, United States of America
| | - Vuk Milinovic
- Persida Inc., Brooklyn, New York, United States of America
| | - Sean Kim
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Patrick McNeely
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Tianyi Wang
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Eric Wenger
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Miguel A. Brown
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Alexander Sickler
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Yuankun Zhu
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Sherry L. Jenkins
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Philip D. Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Deanne M. Taylor
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Adam C. Resnick
- Department of Biomedical and Health Informatics; Department of Pediatrics, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Data Driven Discovery in Biomedicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, United States of America
| | - Aleksandar Milosavljevic
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| |
Collapse
|
3
|
Qin G, Narsinh K, Wei Q, Roach JC, Joshi A, Goetz SL, Moxon ST, Brush MH, Xu C, Yao Y, Glen AK, Morris ED, Ralevski A, Roper R, Belhu B, Zhang Y, Shmulevich I, Hadlock J, Glusman G. Generating Biomedical Knowledge Graphs from Knowledge Bases, Registries, and Multiomic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.14.623648. [PMID: 39605475 PMCID: PMC11601480 DOI: 10.1101/2024.11.14.623648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
As large clinical and multiomics datasets and knowledge resources accumulate, they need to be transformed into computable and actionable information to support automated reasoning. These datasets range from laboratory experiment results to electronic health records (EHRs). Barriers to accessibility and sharing of such datasets include diversity of content, size and privacy. Effective transformation of data into information requires harmonization of stakeholder goals, implementation, enforcement of standards regarding quality and completeness, and availability of resources for maintenance and updates. Systems such as the Biomedical Data Translator leverage knowledge graphs (KGs), structured and machine learning readable knowledge representation, to encode knowledge extracted through inference. We focus here on the transformation of data from multiomics datasets and EHRs into compact knowledge, represented in a KG data structure. We demonstrate this data transformation in the context of the Translator ecosystem, including clinical trials, drug approvals, cancer, wellness, and EHR data. These transformations preserve individual privacy. We provide access to the five resulting KGs through the Translator framework. We show examples of biomedical research questions supported by our KGs, and discuss issues arising from extracting biomedical knowledge from multiomics data.
Collapse
Affiliation(s)
- Guangrong Qin
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Kamileh Narsinh
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Qi Wei
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Jared C. Roach
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Arpita Joshi
- The Scripps Research Institute, 10550 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Skye L. Goetz
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Sierra T. Moxon
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Matthew H. Brush
- UNC Chapel Hill, Department of Genetics, 120 Mason Farm Rd, Chapel Hill, NC 27599, USA
| | - Colleen Xu
- The Scripps Research Institute, 10550 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Yao Yao
- Oregon State University, 1500 SW Jefferson Way, Corvallis, OR 97331
| | - Amy K. Glen
- Oregon State University, 1500 SW Jefferson Way, Corvallis, OR 97331
| | - Evan D. Morris
- Renaissance Computing Institute, 100 Europa Dr, Ste 540, Chapel Hill, NC 27517, USA
| | | | - Ryan Roper
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Basazin Belhu
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Yue Zhang
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Ilya Shmulevich
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Jennifer Hadlock
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Gwênlyn Glusman
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| |
Collapse
|
4
|
Ma C, Liu S, Koslicki D. MetagenomicKG: a knowledge graph for metagenomic applications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585056. [PMID: 38559251 PMCID: PMC10980061 DOI: 10.1101/2024.03.14.585056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Motivation The sheer volume and variety of genomic content within microbial communities makes metagenomics a field rich in biomedical knowledge. To traverse these complex communities and their vast unknowns, metagenomic studies often depend on distinct reference databases, such as the Genome Taxonomy Database (GTDB), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), for various analytical purposes. These databases are crucial for genetic and functional annotation of microbial communities. Nevertheless, the inconsistent nomenclature or identifiers of these databases present challenges for effective integration, representation, and utilization. Knowledge graphs (KGs) offer an appropriate solution by organizing biological entities and their interrelations into a cohesive network. The graph structure not only facilitates the unveiling of hidden patterns but also enriches our biological understanding with deeper insights. Despite KGs having shown potential in various biomedical fields, their application in metagenomics remains underexplored. Results We present MetagenomicKG, a novel knowledge graph specifically tailored for metagenomic analysis. MetagenomicKG integrates taxonomic, functional, and pathogenesis-related information from widely used databases, and further links these with established biomedical knowledge graphs to expand biological connections. Through several use cases, we demonstrate its utility in enabling hypothesis generation regarding the relationships between microbes and diseases, generating sample-specific graph embeddings, and providing robust pathogen prediction. Availability and Implementation The source code and technical details for constructing the MetagenomicKG and reproducing all analyses are available at Github: https://github.com/KoslickiLab/MetagenomicKG. We also host a Neo4j instance: http://mkg.cse.psu.edu:7474 for accessing and querying this graph.
Collapse
Affiliation(s)
- Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - David Koslicki
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Biology, Pennsylvania State University, State College, Pennsylvania, USA
- The One Health Microbiome Center, Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| |
Collapse
|