1
|
Bi X, Cheng Y, Lv X, Liu Y, Li J, Du G, Chen J, Liu L. A Multi-Omics, Machine Learning-Aware, Genome-Wide Metabolic Model of Bacillus Subtilis Refines the Gene Expression and Cell Growth Prediction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2408705. [PMID: 39287062 PMCID: PMC11558093 DOI: 10.1002/advs.202408705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Indexed: 09/19/2024]
Abstract
Given the extensive heterogeneity and variability, understanding cellular functions and regulatory mechanisms through the analysis of multi-omics datasets becomes extremely challenging. Here, a comprehensive modeling framework of multi-omics machine learning and metabolic network models are proposed that covers various cellular biological processes across multiple scales. This model on an extensive normalized compendium of Bacillus subtilis is validated, which encompasses gene expression data from environmental perturbations, transcriptional regulation, signal transduction, protein translation, and growth measurements. Comparison with high-throughput experimental data shows that EM_iBsu1209-ME, constructed on this basis, can accurately predict the expression of 605 genes and the synthesis of 23 metabolites under different conditions. This study paves the way for the construction of comprehensive biological databases and high-performance multi-omics metabolic models to achieve accurate predictive analysis in exploring complex mechanisms of cell genotypes and phenotypes.
Collapse
Affiliation(s)
- Xinyu Bi
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Yang Cheng
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Xueqin Lv
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Yanfeng Liu
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Jianghua Li
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Guocheng Du
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Jian Chen
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| | - Long Liu
- Key Laboratory of Carbohydrate Chemistry and BiotechnologyMinistry of EducationJiangnan UniversityWuxi214122China
- Science Center for Future FoodsMinistry of EducationJiangnan UniversityWuxi214122China
| |
Collapse
|
2
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
3
|
Ansari S, Gergely ZR, Flynn P, Li G, Moore JK, Betterton MD. Quantifying Yeast Microtubules and Spindles Using the Toolkit for Automated Microtubule Tracking (TAMiT). Biomolecules 2023; 13:939. [PMID: 37371519 DOI: 10.3390/biom13060939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 02/07/2023] [Accepted: 02/09/2023] [Indexed: 06/29/2023] Open
Abstract
Fluorescently labeled proteins absorb and emit light, appearing as Gaussian spots in fluorescence imaging. When fluorescent tags are added to cytoskeletal polymers such as microtubules, a line of fluorescence and even non-linear structures results. While much progress has been made in techniques for imaging and microscopy, image analysis is less well-developed. Current analysis of fluorescent microtubules uses either manual tools, such as kymographs, or automated software. As a result, our ability to quantify microtubule dynamics and organization from light microscopy remains limited. Despite the development of automated microtubule analysis tools for in vitro studies, analysis of images from cells often depends heavily on manual analysis. One of the main reasons for this disparity is the low signal-to-noise ratio in cells, where background fluorescence is typically higher than in reconstituted systems. Here, we present the Toolkit for Automated Microtubule Tracking (TAMiT), which automatically detects, optimizes, and tracks fluorescent microtubules in living yeast cells with sub-pixel accuracy. Using basic information about microtubule organization, TAMiT detects linear and curved polymers using a geometrical scanning technique. Images are fit via an optimization problem for the microtubule image parameters that are solved using non-linear least squares in Matlab. We benchmark our software using simulated images and show that it reliably detects microtubules, even at low signal-to-noise ratios. Then, we use TAMiT to measure monopolar spindle microtubule bundle number, length, and lifetime in a large dataset that includes several S. pombe mutants that affect microtubule dynamics and bundling. The results from the automated analysis are consistent with previous work and suggest a direct role for CLASP/Cls1 in bundling spindle microtubules. We also illustrate automated tracking of single curved astral microtubules in S. cerevisiae, with measurement of dynamic instability parameters. The results obtained with our fully-automated software are similar to results using hand-tracked measurements. Therefore, TAMiT can facilitate automated analysis of spindle and microtubule dynamics in yeast cells.
Collapse
Affiliation(s)
- Saad Ansari
- Department of Physics, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Zachary R Gergely
- Department of Physics, University of Colorado Boulder, Boulder, CO 80309, USA
- Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Patrick Flynn
- Department of Physics, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Gabriella Li
- Department of Cell and Developmental Biology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Jeffrey K Moore
- Department of Cell and Developmental Biology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Meredith D Betterton
- Department of Physics, University of Colorado Boulder, Boulder, CO 80309, USA
- Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO 80309, USA
| |
Collapse
|
4
|
Ansari S, Gergely ZR, Flynn P, Li G, Moore JK, Betterton MD. Quantifying yeast microtubules and spindles using the Toolkit for Automated Microtubule Tracking (TAMiT). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.07.527544. [PMID: 36798368 PMCID: PMC9934621 DOI: 10.1101/2023.02.07.527544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Fluorescently labeled proteins absorb and emit light, appearing as Gaussian spots in fluorescence imaging. When fluorescent tags are added to cytoskeletal polymers such as microtubules, a line of fluorescence and even non-linear structures results. While much progress has been made in techniques for imaging and microscopy, image analysis is less well developed. Current analysis of fluorescent microtubules uses either manual tools, such as kymographs, or automated software. As a result, our ability to quantify microtubule dynamics and organization from light microscopy remains limited. Despite development of automated microtubule analysis tools for in vitro studies, analysis of images from cells often depends heavily on manual analysis. One of the main reasons for this disparity is the low signal-to-noise ratio in cells, where background fluorescence is typically higher than in reconstituted systems. Here, we present the Toolkit for Automated Microtubule Tracking (TAMiT), which automatically detects, optimizes and tracks fluorescent microtubules in living yeast cells with sub-pixel accuracy. Using basic information about microtubule organization, TAMiT detects linear and curved polymers using a geometrical scanning technique. Images are fit via an optimization problem for the microtubule image parameters that is solved using non-linear least squares in Matlab. We benchmark our software using simulated images and show that it reliably detects microtubules, even at low signal-to-noise ratios. Then, we use TAMiT to measure monopolar spindle microtubule bundle number, length, and lifetime in a large dataset that includes several S. pombe mutants that affect microtubule dynamics and bundling. The results from the automated analysis are consistent with previous work, and suggest a direct role for CLASP/Cls1 in bundling spindle microtubules. We also illustrate automated tracking of single curved astral microtubules in S. cerevisiae , with measurement of dynamic instability parameters. The results obtained with our fully-automated software are similar to results using hand-tracked measurements. Therefore, TAMiT can facilitate automated analysis of spindle and microtubule dynamics in yeast cells.
Collapse
|
5
|
Philpott DN, Chen K, Atwal RS, Li D, Christie J, Sargent EH, Kelley SO. Ultrathroughput immunomagnetic cell sorting platform. LAB ON A CHIP 2022; 22:4822-4830. [PMID: 36382608 DOI: 10.1039/d2lc00798c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
High-throughput phenotypic cell sorting is critical to the development of cell-based therapies and cell screening discovery platforms. However, current cytometry platforms are limited by throughput, number of fractionated populations that can be isolated, cell viability, and cost. We present an ultrathroughput microfluidic cell sorter capable of processing hundreds of millions of live cells per hour per device based on protein expression. This device, a next-generation microfluidic cell sorter (NG-MICS), combines multiple technologies, including 3D printing, reversible clamp sealing, and superhydrophobic treatments to create a reusable and user-friendly platform ready for deployment. The utility of such a platform is demonstrated through the rapid isolation of mature natural killer cells from peripheral blood mononuclear cells, for use in CAR-NK therapies at clinically-relevant scale.
Collapse
Affiliation(s)
- David N Philpott
- Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Kangfu Chen
- Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Randy S Atwal
- Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA.
| | - Derek Li
- Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Jessie Christie
- Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Edward H Sargent
- Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, Ontario, Canada
- Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
| | - Shana O Kelley
- Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA.
- Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Chemistry, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
6
|
Klein B, Hoel E, Swain A, Griebenow R, Levin M. Evolution and emergence: higher order information structure in protein interactomes across the tree of life. Integr Biol (Camb) 2021; 13:283-294. [PMID: 34933345 DOI: 10.1093/intbio/zyab020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 11/16/2021] [Accepted: 11/25/2021] [Indexed: 11/14/2022]
Abstract
The internal workings of biological systems are notoriously difficult to understand. Due to the prevalence of noise and degeneracy in evolved systems, in many cases the workings of everything from gene regulatory networks to protein-protein interactome networks remain black boxes. One consequence of this black-box nature is that it is unclear at which scale to analyze biological systems to best understand their function. We analyzed the protein interactomes of over 1800 species, containing in total 8 782 166 protein-protein interactions, at different scales. We show the emergence of higher order 'macroscales' in these interactomes and that these biological macroscales are associated with lower noise and degeneracy and therefore lower uncertainty. Moreover, the nodes in the interactomes that make up the macroscale are more resilient compared with nodes that do not participate in the macroscale. These effects are more pronounced in interactomes of eukaryota, as compared with prokaryota; these results hold even after sensitivity tests where we recalculate the emergent macroscales under network simulations where we add different edge weights to the interactomes. This points to plausible evolutionary adaptation for macroscales: biological networks evolve informative macroscales to gain benefits of both being uncertain at lower scales to boost their resilience, and also being 'certain' at higher scales to increase their effectiveness at information transmission. Our work explains some of the difficulty in understanding the workings of biological networks, since they are often most informative at a hidden higher scale, and demonstrates the tools to make these informative higher scales explicit.
Collapse
|
7
|
Community development, implementation, and assessment of a NIBLSE bioinformatics sequence similarity learning resource. PLoS One 2021; 16:e0257404. [PMID: 34506617 PMCID: PMC8432852 DOI: 10.1371/journal.pone.0257404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 08/31/2021] [Indexed: 11/19/2022] Open
Abstract
As powerful computational tools and 'big data' transform the biological sciences, bioinformatics training is becoming necessary to prepare the next generation of life scientists. Furthermore, because the tools and resources employed in bioinformatics are constantly evolving, bioinformatics learning materials must be continuously improved. In addition, these learning materials need to move beyond today's typical step-by-step guides to promote deeper conceptual understanding by students. One of the goals of the Network for Integrating Bioinformatics into Life Sciences Education (NIBSLE) is to create, curate, disseminate, and assess appropriate open-access bioinformatics learning resources. Here we describe the evolution, integration, and assessment of a learning resource that explores essential concepts of biological sequence similarity. Pre/post student assessment data from diverse life science courses show significant learning gains. These results indicate that the learning resource is a beneficial educational product for the integration of bioinformatics across curricula.
Collapse
|
8
|
Identification of early liver toxicity gene biomarkers using comparative supervised machine learning. Sci Rep 2020; 10:19128. [PMID: 33154507 PMCID: PMC7645727 DOI: 10.1038/s41598-020-76129-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 10/12/2020] [Indexed: 02/08/2023] Open
Abstract
Screening agrochemicals and pharmaceuticals for potential liver toxicity is required for regulatory approval and is an expensive and time-consuming process. The identification and utilization of early exposure gene signatures and robust predictive models in regulatory toxicity testing has the potential to reduce time and costs substantially. In this study, comparative supervised machine learning approaches were applied to the rat liver TG-GATEs dataset to develop feature selection and predictive testing. We identified ten gene biomarkers using three different feature selection methods that predicted liver necrosis with high specificity and selectivity in an independent validation dataset from the Microarray Quality Control (MAQC)-II study. Nine of the ten genes that were selected with the supervised methods are involved in metabolism and detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, and Tkfc) and transcriptional regulation (Ablim3). Several of these genes are also implicated in liver carcinogenesis, including Crat, Car3 and Slc23a1. Our biomarker gene signature provides high statistical accuracy and a manageable number of genes to study as indicators to potentially accelerate toxicity testing based on their ability to induce liver necrosis and, eventually, liver cancer.
Collapse
|
9
|
Sielemann K, Hafner A, Pucker B. The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ 2020; 8:e9954. [PMID: 33024631 PMCID: PMC7518187 DOI: 10.7717/peerj.9954] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
The 'big data' revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define 'successful reuse' as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.
Collapse
Affiliation(s)
- Katharina Sielemann
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
| | - Alenka Hafner
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Current Affiliation: Intercollege Graduate Degree Program in Plant Biology, Penn State University, University Park, State College, PA, United States of America
| | - Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
10
|
Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control. Commun Integr Biol 2020; 13:108-118. [PMID: 33014263 PMCID: PMC7518458 DOI: 10.1080/19420889.2020.1802914] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 07/22/2020] [Accepted: 07/26/2020] [Indexed: 02/07/2023] Open
Abstract
The biological sciences span many spatial and temporal scales in attempts to understand the function and evolution of complex systems-level processes, such as embryogenesis. It is generally assumed that the most effective description of these processes is in terms of molecular interactions. However, recent developments in information theory and causal analysis now allow for the quantitative resolution of this question. In some cases, macro-scale models can minimize noise and increase the amount of information an experimenter or modeler has about "what does what." This result has numerous implications for evolution, pattern regulation, and biomedical strategies. Here, we provide an introduction to these quantitative techniques, and use them to show how informative macro-scales are common across biology. Our goal is to give biologists the tools to identify the maximally-informative scale at which to model, experiment on, predict, control, and understand complex biological systems.
Collapse
Affiliation(s)
- Erik Hoel
- Allen Discovery Center, Tufts University, Medford, MA, USA
| | - Michael Levin
- Allen Discovery Center, Tufts University, Medford, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| |
Collapse
|
11
|
Macklin DN, Ahn-Horst TA, Choi H, Ruggero NA, Carrera J, Mason JC, Sun G, Agmon E, DeFelice MM, Maayan I, Lane K, Spangler RK, Gillies TE, Paull ML, Akhter S, Bray SR, Weaver DS, Keseler IM, Karp PD, Morrison JH, Covert MW. Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation. Science 2020; 369:eaav3751. [PMID: 32703847 PMCID: PMC7990026 DOI: 10.1126/science.aav3751] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/28/2019] [Accepted: 05/26/2020] [Indexed: 12/24/2022]
Abstract
The extensive heterogeneity of biological data poses challenges to analysis and interpretation. Construction of a large-scale mechanistic model of Escherichia coli enabled us to integrate and cross-evaluate a massive, heterogeneous dataset based on measurements reported by various groups over decades. We identified inconsistencies with functional consequences across the data, including that the total output of the ribosomes and RNA polymerases described by data are not sufficient for a cell to reproduce measured doubling times, that measured metabolic parameters are neither fully compatible with each other nor with overall growth, and that essential proteins are absent during the cell cycle-and the cell is robust to this absence. Finally, considering these data as a whole leads to successful predictions of new experimental outcomes, in this case protein half-lives.
Collapse
Affiliation(s)
- Derek N Macklin
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Travis A Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Nicholas A Ruggero
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
- Department of Chemical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Javier Carrera
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - John C Mason
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Eran Agmon
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Mialy M DeFelice
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Inbal Maayan
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Keara Lane
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Ryan K Spangler
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Taryn E Gillies
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Morgan L Paull
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Sajia Akhter
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Samuel R Bray
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | | | | | - Jerry H Morrison
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
- Allen Discovery Center at Stanford University, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Abstract
Osteoarthritis (OA) is an extremely common musculoskeletal disease. However, current guidelines are not well suited for diagnosing patients in the early stages of disease and do not discriminate patients for whom the disease might progress rapidly. The most important hurdle in OA management is identifying and classifying patients who will benefit most from treatment. Further efforts are needed in patient subgrouping and developing prediction models. Conventional statistical modelling approaches exist; however, these models are limited in the amount of information they can adequately process. Comprehensive patient-specific prediction models need to be developed. Approaches such as data mining and machine learning should aid in the development of such models. Although a challenging task, technology is now available that should enable subgrouping of patients with OA and lead to improved clinical decision-making and precision medicine.
Collapse
|
13
|
Harfouche AL, Jacobson DA, Kainer D, Romero JC, Harfouche AH, Scarascia Mugnozza G, Moshelion M, Tuskan GA, Keurentjes JJ, Altman A. Accelerating Climate Resilient Plant Breeding by Applying Next-Generation Artificial Intelligence. Trends Biotechnol 2019; 37:1217-1235. [DOI: 10.1016/j.tibtech.2019.05.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 05/18/2019] [Accepted: 05/23/2019] [Indexed: 12/20/2022]
|
14
|
Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures. BIOMED RESEARCH INTERNATIONAL 2019; 2019:6750296. [PMID: 30809545 PMCID: PMC6369486 DOI: 10.1155/2019/6750296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 01/13/2019] [Indexed: 11/30/2022]
Abstract
In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.
Collapse
|
15
|
Schiffer J, Mael LE, Prather KA, Amaro RE, Grassian VH. Sea Spray Aerosol: Where Marine Biology Meets Atmospheric Chemistry. ACS CENTRAL SCIENCE 2018; 4:1617-1623. [PMID: 30648145 PMCID: PMC6311946 DOI: 10.1021/acscentsci.8b00674] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Indexed: 05/25/2023]
Abstract
Atmospheric aerosols have long been known to alter climate by scattering incoming solar radiation and acting as seeds for cloud formation. These processes have vast implications for controlling the chemistry of our environment and the Earth's climate. Sea spray aerosol (SSA) is emitted over nearly three-quarters of our planet, yet precisely how SSA impacts Earth's radiation budget remains highly uncertain. Over the past several decades, studies have shown that SSA particles are far more complex than just sea salt. Ocean biological and physical processes produce individual SSA particles containing a diverse array of biological species including proteins, enzymes, bacteria, and viruses and a diverse array of organic compounds including fatty acids and sugars. Thus, a new frontier of research is emerging at the nexus of chemistry, biology, and atmospheric science. In this Outlook article, we discuss how current and future aerosol chemistry research demands a tight coupling between experimental (observational and laboratory studies) and computational (simulation-based) methods. This integration of approaches will enable the systematic interrogation of the complexity within individual SSA particles at a level that will enable prediction of the physicochemical properties of real-world SSA, ultimately illuminating the detailed mechanisms of how the constituents within individual SSA impact climate.
Collapse
Affiliation(s)
- Jamie
M. Schiffer
- Department of Chemistry and Biochemistry and Department of Nanoengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0378, United States
| | - Liora E. Mael
- Department of Chemistry and Biochemistry and Department of Nanoengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0378, United States
| | - Kimberly A. Prather
- Department of Chemistry and Biochemistry and Department of Nanoengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0378, United States
- Scripps
Institution of Oceanography, University
of California, San Diego, La Jolla, California 92093, United States
| | - Rommie E. Amaro
- Department of Chemistry and Biochemistry and Department of Nanoengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0378, United States
| | - Vicki H. Grassian
- Department of Chemistry and Biochemistry and Department of Nanoengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0378, United States
- Scripps
Institution of Oceanography, University
of California, San Diego, La Jolla, California 92093, United States
| |
Collapse
|
16
|
Lee YS, Wong AK, Tadych A, Hartmann BM, Park CY, DeJesus VA, Ramos I, Zaslavsky E, Sealfon SC, Troyanskaya OG. Interpretation of an individual functional genomics experiment guided by massive public data. Nat Methods 2018; 15:1049-1052. [PMID: 30478325 PMCID: PMC6941785 DOI: 10.1038/s41592-018-0218-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 09/27/2018] [Indexed: 12/11/2022]
Abstract
A key unmet challenge in interpreting omics experiments is inferring biological meaning in the context of public functional genomics data. We developed a computational framework, Your Evidence Tailored Integration (YETI; http://yeti.princeton.edu/ ), which creates specialized functional interaction maps from large public datasets relevant to an individual omics experiment. Using this tailored integration, we predicted and experimentally confirmed an unexpected divergence in viral replication after seasonal or pandemic human influenza virus infection.
Collapse
Affiliation(s)
- Young-suk Lee
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Present address: School of Biological Sciences, Seoul National University, Seoul, Korea
| | - Aaron K. Wong
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Alicja Tadych
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Boris M. Hartmann
- Department of Neurology and Center for Advanced Research on Diagnostic Assays, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Veronica A. DeJesus
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Irene Ramos
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elena Zaslavsky
- Department of Neurology and Center for Advanced Research on Diagnostic Assays, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stuart C. Sealfon
- Department of Neurology and Center for Advanced Research on Diagnostic Assays, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Olga G. Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| |
Collapse
|
17
|
Analysis of public RNA-sequencing data reveals biological consequences of genetic heterogeneity in cell line populations. Sci Rep 2018; 8:11226. [PMID: 30046134 PMCID: PMC6060100 DOI: 10.1038/s41598-018-29506-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 07/13/2018] [Indexed: 01/19/2023] Open
Abstract
Meta-analysis of datasets available in public repositories are used to gather and summarise experiments performed across laboratories, as well as to explore consistency of scientific findings. As data quality and biological equivalency across samples may obscure such analyses and consequently their conclusions, we investigated the comparability of 85 public RNA-seq cell line datasets. Thousands of pairwise comparisons of single nucleotide variants in 139 samples revealed variable genetic heterogeneity of the eight cell line populations analysed as well as variable data quality. The H9 and HCT116 cell lines were found to be remarkably stable across laboratories (with median concordances of 99.2% and 98.5%, respectively), in contrast to the highly variable HeLa cells (89.3%). We show that the genetic heterogeneity encountered greatly affects gene expression between same-cell comparisons, highlighting the importance of interrogating the biological equivalency of samples when comparing experimental datasets. Both the number of differentially expressed genes and the expression levels negatively correlate with the genetic heterogeneity. Finally, we demonstrate how comparing genetically heterogeneous datasets affect gene expression analyses and that high dissimilarity between same-cell datasets alters the expression of more than 300 cancer-related genes, which are often the focus of studies using cell lines.
Collapse
|
18
|
Lu AX, Chong YT, Hsu IS, Strome B, Handfield LF, Kraus O, Andrews BJ, Moses AM. Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins. eLife 2018; 7:e31872. [PMID: 29620521 PMCID: PMC5935485 DOI: 10.7554/elife.31872] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Accepted: 03/30/2018] [Indexed: 01/29/2023] Open
Abstract
The evaluation of protein localization changes on a systematic level is a powerful tool for understanding how cells respond to environmental, chemical, or genetic perturbations. To date, work in understanding these proteomic responses through high-throughput imaging has catalogued localization changes independently for each perturbation. To distinguish changes that are targeted responses to the specific perturbation or more generalized programs, we developed a scalable approach to visualize the localization behavior of proteins across multiple experiments as a quantitative pattern. By applying this approach to 24 experimental screens consisting of nearly 400,000 images, we differentiated specific responses from more generalized ones, discovered nuance in the localization behavior of stress-responsive proteins, and formed hypotheses by clustering proteins that have similar patterns. Previous approaches aim to capture all localization changes for a single screen as accurately as possible, whereas our work aims to integrate large amounts of imaging data to find unexpected new cell biology.
Collapse
Affiliation(s)
- Alex X Lu
- Department of Computer ScienceUniversity of TorontoTorontoCanada
| | - Yolanda T Chong
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoCanada
| | - Ian Shen Hsu
- Department of Cell and Systems BiologyUniversity of TorontoTorontoCanada
| | - Bob Strome
- Department of Cell and Systems BiologyUniversity of TorontoTorontoCanada
| | | | - Oren Kraus
- Department of Electrical and Computer EngineeringUniversity of TorontoTorontoCanada
| | - Brenda J Andrews
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoCanada
| | - Alan M Moses
- Department of Computer ScienceUniversity of TorontoTorontoCanada
- Department of Cell and Systems BiologyUniversity of TorontoTorontoCanada
- Center for Analysis of Genome Evolution and FunctionUniversity of TorontoTorontoCanada
| |
Collapse
|
19
|
Cell Cycle Model System for Advancing Cancer Biomarker Research. Sci Rep 2017; 7:17989. [PMID: 29269772 PMCID: PMC5740075 DOI: 10.1038/s41598-017-17845-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 11/27/2017] [Indexed: 01/14/2023] Open
Abstract
Progress in understanding the complexity of a devastating disease such as cancer has underscored the need for developing comprehensive panels of molecular markers for early disease detection and precision medicine applications. The present study was conducted to assess whether a cohesive biological context can be assigned to protein markers derived from public data mining, and whether mass spectrometry can be utilized to screen for the co-expression of functionally related biomarkers to be recommended for further exploration in clinical context. Cell cycle arrest/release experiments of MCF7/SKBR3 breast cancer and MCF10 non-tumorigenic cells were used as a surrogate to support the production of proteins relevant to aberrant cell proliferation. Information downloaded from the scientific public domain was queried with bioinformatics tools to generate an initial list of 1038 cancer-associated proteins. Mass spectrometric analysis of cell extracts identified 352 proteins that could be matched to the public list. Differential expression, enrichment, and protein-protein interaction analysis of the proteomic data revealed several functionally-related clusters of relevance to cancer. The results demonstrate that public data derived from independent experiments can be used to inform biological research and support the development of molecular assays for probing the characteristics of a disease.
Collapse
|
20
|
Culley TM. The frontier of data discoverability: Why we need to share our data. APPLICATIONS IN PLANT SCIENCES 2017; 5:apps1700111. [PMID: 29109924 PMCID: PMC5664969 DOI: 10.3732/apps.1700111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 09/20/2017] [Indexed: 06/07/2023]
Abstract
We are now in an era where sharing and making data widely accessible are not only expected within many disciplines, but also required by federal granting agencies and many scientific journals. In addition, there are practical reasons why authors should deposit their data into permanent data repositories: (1) it prevents data loss due to accidents, theft, or death of the researcher; (2) it enables published research to be reproduced by others; (3) publications associated with accessible data sets can have higher citation rates; (4) deposited data sets are increasingly recognized for scholarly recognition and professional advancement; and (5) stored and accessible data can be used in the future for projects that are unanticipated today. Applications in Plant Sciences requires that data underlying its articles be publicly accessible as a condition of publication to promote the continued advancement of the field of plant biology.
Collapse
|
21
|
|
22
|
Yang A, Troup M, Ho JWK. Scalability and Validation of Big Data Bioinformatics Software. Comput Struct Biotechnol J 2017; 15:379-386. [PMID: 28794828 PMCID: PMC5537105 DOI: 10.1016/j.csbj.2017.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 06/30/2017] [Accepted: 07/17/2017] [Indexed: 12/20/2022] Open
Abstract
This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
Collapse
Affiliation(s)
- Andrian Yang
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia
| | - Michael Troup
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia
| |
Collapse
|
23
|
Marshall DD, Powers R. Beyond the paradigm: Combining mass spectrometry and nuclear magnetic resonance for metabolomics. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2017; 100:1-16. [PMID: 28552170 PMCID: PMC5448308 DOI: 10.1016/j.pnmrs.2017.01.001] [Citation(s) in RCA: 154] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Revised: 01/04/2017] [Accepted: 01/08/2017] [Indexed: 05/02/2023]
Abstract
Metabolomics is undergoing tremendous growth and is being employed to solve a diversity of biological problems from environmental issues to the identification of biomarkers for human diseases. Nuclear magnetic resonance (NMR) and mass spectrometry (MS) are the analytical tools that are routinely, but separately, used to obtain metabolomics data sets due to their versatility, accessibility, and unique strengths. NMR requires minimal sample handling without the need for chromatography, is easily quantitative, and provides multiple means of metabolite identification, but is limited to detecting the most abundant metabolites (⩾1μM). Conversely, mass spectrometry has the ability to measure metabolites at very low concentrations (femtomolar to attomolar) and has a higher resolution (∼103-104) and dynamic range (∼103-104), but quantitation is a challenge and sample complexity may limit metabolite detection because of ion suppression. Consequently, liquid chromatography (LC) or gas chromatography (GC) is commonly employed in conjunction with MS, but this may lead to other sources of error. As a result, NMR and mass spectrometry are highly complementary, and combining the two techniques is likely to improve the overall quality of a study and enhance the coverage of the metabolome. While the majority of metabolomic studies use a single analytical source, there is a growing appreciation of the inherent value of combining NMR and MS for metabolomics. An overview of the current state of utilizing both NMR and MS for metabolomics will be presented.
Collapse
Affiliation(s)
- Darrell D Marshall
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE 68588-0304, United States
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE 68588-0304, United States.
| |
Collapse
|
24
|
Expanding the Immunology Toolbox: Embracing Public-Data Reuse and Crowdsourcing. Immunity 2016; 45:1191-1204. [DOI: 10.1016/j.immuni.2016.12.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 11/30/2016] [Accepted: 12/01/2016] [Indexed: 12/15/2022]
|
25
|
Robinson JL, Nielsen J. Integrative analysis of human omics data using biomolecular networks. MOLECULAR BIOSYSTEMS 2016; 12:2953-64. [PMID: 27510223 DOI: 10.1039/c6mb00476h] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
High-throughput '-omics' technologies have given rise to an increasing abundance of genome-scale data detailing human biology at the molecular level. Although these datasets have already made substantial contributions to a more comprehensive understanding of human physiology and diseases, their interpretation becomes increasingly cryptic and nontrivial as they continue to expand in size and complexity. Systems biology networks offer a scaffold upon which omics data can be integrated, facilitating the extraction of new and physiologically relevant information from the data. Two of the most prevalent networks that have been used for such integrative analyses of omics data are genome-scale metabolic models (GEMs) and protein-protein interaction (PPI) networks, both of which have demonstrated success among many different omics and sample types. This integrative approach seeks to unite 'top-down' omics data with 'bottom-up' biological networks in a synergistic fashion that draws on the strengths of both strategies. As the volume and resolution of high-throughput omics data continue to grow, integrative network-based analyses are expected to play an increasingly important role in their interpretation.
Collapse
Affiliation(s)
- Jonathan L Robinson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96 Gothenburg, Sweden.
| | | |
Collapse
|
26
|
King ZA, Lu J, Dräger A, Miller P, Federowicz S, Lerman JA, Ebrahim A, Palsson BO, Lewis NE. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res 2016; 44:D515-22. [PMID: 26476456 PMCID: PMC4702785 DOI: 10.1093/nar/gkv1049] [Citation(s) in RCA: 546] [Impact Index Per Article: 60.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 09/27/2015] [Accepted: 10/02/2015] [Indexed: 11/14/2022] Open
Abstract
Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data.
Collapse
Affiliation(s)
- Zachary A King
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Justin Lu
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Andreas Dräger
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Philip Miller
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Stephen Federowicz
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Joshua A Lerman
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Ali Ebrahim
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
27
|
Lippincott-Schwartz J. Interdisciplinary innovations are key to effective use of quantitative biological information. Mol Biol Cell 2015; 26:3893. [PMID: 26543194 PMCID: PMC4710218 DOI: 10.1091/mbc.e15-09-0644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Jennifer Lippincott-Schwartz
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892
| |
Collapse
|