1
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
2
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Shukla M, Xia F, Clyde A, Vasanthakumari P, Doroshow JH, Stevens RL. Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models. Cancers (Basel) 2023; 16:50. [PMID: 38201477 PMCID: PMC10777918 DOI: 10.3390/cancers16010050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yvonne A. Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA;
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Austin Clyde
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - James H. Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
4
|
Partin A, Brettin T, Zhu Y, Dolezal JM, Kochanny S, Pearson AT, Shukla M, Evrard YA, Doroshow JH, Stevens RL. Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images. Front Med (Lausanne) 2023; 10:1058919. [PMID: 36960342 PMCID: PMC10027779 DOI: 10.3389/fmed.2023.1058919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- *Correspondence: Alexander Partin
| | - Thomas Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - James M. Dolezal
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Sara Kochanny
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Alexander T. Pearson
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yvonne A. Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, United States
| | - James H. Doroshow
- Division of Cancer Therapeutics and Diagnosis, National Cancer Institute, Bethesda, MD, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
5
|
Partin A, Brettin TS, Zhu Y, Shukla M, Xia F, Yoo H, Dolezal JM, Kochanny S, Pearson AT, Evrard YA, Doroshow JH, Stevens RL. Drug response prediction in patient-derived xenografts with data augmentation and multimodal deep learning. J Clin Oncol 2022. [DOI: 10.1200/jco.2022.40.16_suppl.e13572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e13572 Background: Prediction of drug response is a critical research area in precision oncology and has been previously explored with large drug screening studies of cancer cell lines (CCLs). Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies because the in vivo environment of PDXs helps preserve tumor heterogeneity and usually better mimics drug response of patients with cancer compared to CCLs. Methods: We investigate multimodal neural network (NN) and data augmentation for drug response prediction in PDXs. The multimodal NN learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs) where the multi-modality refers to tumor features only. The NN uses late integration where separate subnetworks are used to encode the input feature types before concatenation and prediction layers. Median tumor volume per treatment group is assessed relative to the control group to create a binary variable representing response. The data include twelve single-drug and 36 drug-pair treatments resulting in 2,556 single-drug and 2,203 drug-pair response values. Pathology and omics data from 487 PDXs from NCI's Patient Derived Models Repository are used as tumor feature model inputs. We explore whether the integration of WSIs with GE improves predictions as compared with models that use GE alone. We use two methods to address the limited number of response values in the dataset: 1) homogenize drug representations which allows to combine single-drug and drug-pairs into a single dataset, 2) augment drug-pair samples by switching the order of drug features which doubles the sample size of all drug-pair samples. These methods enable us to combine single-drug and drug-pair treatments which results in 6,962 responses, allowing us to train multimodal and unimodal NNs without changing architectures or the dataset. Results: Prediction performance of three unimodal NNs which use GE (um1, um2, and um3) are compared to assess the contribution of data augmentation methods. NN um1 that uses the full dataset which includes the original and the augmented drug-pair treatments as well as single-drug treatments significantly outperforms NNs (p-values < 0.01) that ignore either the augmented drug-pairs (um2) or the single-drug treatments (um3). In assessing the contribution of multimodal learning, results show that the multimodal NN (mm) outperforms both unimodal NNs that ignore either the GE (um4) or the WSIs (um1). However, the improvement of mm over um1 is not statistically significant (p-value < 0.26). Conclusions: Our results show that data augmentation and integration of histology images and GE can help improve prediction performance of drug response in PDXs.[Table: see text]
Collapse
Affiliation(s)
| | | | - Yitan Zhu
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Maulik Shukla
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Fangfang Xia
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Hyunseung Yoo
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | | | - Sara Kochanny
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL
| | | | - Yvonne A. Evrard
- Molecular Characterization Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD
| | | | - Rick L. Stevens
- Department of Energy, Argonne National Laboratory, Lemont, IL
| |
Collapse
|
6
|
Zhu Y, Brettin TS, Partin A, Xia F, Shukla M, Yoo H, Cancino A, Larsen B, Shaxted J, Salahudeen A, White K, Stevens RL. Multifactorial drug response modeling based on cancer organoid data. J Clin Oncol 2022. [DOI: 10.1200/jco.2022.40.16_suppl.e13544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e13544 Background: Prediction of drug response based on cancer molecular profiles is of paramount importance for precision oncology. Most existing drug response prediction models are built using drug screening data of immortalized cancer cell lines, which usually have altered genomic profiles compared with patient tumors. Recently, patient-derived organoids (PDOs) are emerging as a promising platform better representing patient tumors. We build computational drug response prediction models based on PDO drug screening data, which is the first study of its type to our knowledge. Methods: We successfully developed 27 PDO lines of colorectal cancer and 20 PDO lines of head and neck (H&N) cancer. Transcriptomics, copy number variation (CNV), and targeted DNA mutation data were generated for the PDO lines. The PDO lines were screened with 36 drugs of diversified mechanisms. The area under the dose response curve was taken as the response measurement. We used the LightGBM algorithm to build response prediction models based on cancer molecular data and drug chemical descriptors/fingerprints. To investigate the influence of different factors on the prediction performance, including different cancer types, cancer molecular features, drug features, data preprocessing methods, and others, we applied a multifactorial analysis scenario to build and evaluate 3,384 prediction models constructed with all possible combinations of the factors. For example, we built prediction models for H&N and colorectal PDOs separately and jointly. Results: A prediction model built for H&N PDOs achieved the highest prediction performance among all prediction models, which was R2 of 0.790 in 10-fold cross-validation. The model was built using drug descriptors, CNVs, and expressions of “landmark” genes well-representing cellular transcriptomic changes identified in the LINCS project. The table below includes all the factorial differences that caused an average R2 change larger than 1%. All R2 changes are statistically significant (p-values < 1×10–50), evaluated by pair-wise t-tests comparing models built with the status of the factor changed. The prediction performance increased, from colorectal cancer to two cancer types combined, and to H&N cancer. Gene expression data, either whole-transcriptome or the subset of LINCS genes, boosted the prediction performance. Between the two different dyes used to stain dead cells, TO-PRO-3 provided a higher prediction performance than Caspase-3/7. Conclusions: The highest drug response prediction performance achieved is R2 of 0.790. Cancer type, dye, and whether gene expressions are used in modeling are the factors most influential on prediction performance.[Table: see text]
Collapse
Affiliation(s)
- Yitan Zhu
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | | | | | - Fangfang Xia
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Maulik Shukla
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Hyunseung Yoo
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | | | | | | | | | | | - Rick L. Stevens
- Department of Energy, Argonne National Laboratory, Lemont, IL
| |
Collapse
|
7
|
Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, Evrard YA, Doroshow JH, Stevens RL. Converting tabular data into images for deep learning with convolutional neural networks. Sci Rep 2021; 11:11325. [PMID: 34059739 PMCID: PMC8166880 DOI: 10.1038/s41598-021-90923-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 05/17/2021] [Indexed: 12/11/2022] Open
Abstract
Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.
Collapse
Affiliation(s)
- Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA.
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Hyunseung Yoo
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Yvonne A Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, 21702, USA
| | - James H Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
- Department of Computer Science, The University of Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
8
|
Partin A, Brettin T, Evrard YA, Zhu Y, Yoo H, Xia F, Jiang S, Clyde A, Shukla M, Fonstein M, Doroshow JH, Stevens RL. Learning curves for drug response prediction in cancer cell lines. BMC Bioinformatics 2021; 22:252. [PMID: 34001007 PMCID: PMC8130157 DOI: 10.1186/s12859-021-04163-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 05/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA. .,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA.
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA
| | - Yvonne A Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Hyunseung Yoo
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Songhao Jiang
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Michael Fonstein
- Biosciences Division, Argonne National Laboratory, Lemont, IL, USA
| | - James H Doroshow
- Division of Cancer Therapeutics and Diagnosis, National Cancer Institute, Bethesda, MD, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA.,Department of Computer Science, University of Chicago, Chicago, IL, USA
| |
Collapse
|
9
|
Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, Yoo H, Doroshow JH, Stevens RL. Ensemble transfer learning for the prediction of anti-cancer drug response. Sci Rep 2020; 10:18040. [PMID: 33093487 PMCID: PMC7581765 DOI: 10.1038/s41598-020-74921-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 10/08/2020] [Indexed: 12/13/2022] Open
Abstract
Transfer learning, which transfers patterns learned on a source dataset to a related target dataset for constructing prediction models, has been shown effective in many applications. In this paper, we investigate whether transfer learning can be used to improve the performance of anti-cancer drug response prediction models. Previous transfer learning studies for drug response prediction focused on building models to predict the response of tumor cells to a specific drug treatment. We target the more challenging task of building general prediction models that can make predictions for both new tumor cells and new drugs. Uniquely, we investigate the power of transfer learning for three drug response prediction applications including drug repurposing, precision oncology, and new drug development, through different data partition schemes in cross-validation. We extend the classic transfer learning framework through ensemble and demonstrate its general utility with three representative prediction algorithms including a gradient boosting model and two deep neural networks. The ensemble transfer learning framework is tested on benchmark in vitro drug screening datasets. The results demonstrate that our framework broadly improves the prediction performance in all three drug response prediction applications with all three prediction algorithms.
Collapse
Affiliation(s)
- Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA.
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Yvonne A Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, 21702, USA
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Hyunseung Yoo
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - James H Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA.,Department of Computer Science, The University of Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
10
|
Zhu Y, Brettin TS, Xia F, Shukla M, Partin A, Yoo H, Stevens RL. Enhanced co-expression extrapolation (COXEN) gene selection method for building anticancer drug response prediction models. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.e14073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e14073 Background: Accurate prediction of tumor response to a drug treatment is of paramount importance for precision oncology. The co-expression extrapolation (COXEN) gene selection approach has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug. Here, we enhance the original COXEN approach to select genes that are predictive of the efficacies of multiple drugs simultaneously for building general drug response prediction model. Methods: We implemented two methods to select predictive genes. The first method ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs. The second method uses a linear regression model to evaluate the prediction power of a gene for all drugs while the drugs are one-hot encoded in the regression model. Among the predictive genes, we further select genes by evaluating the preservation of co-expression patterns between cancer cases with drug response data available and cancer cases for which drug response needs to be predicted, because the preservation of co-expression patterns indicates the similarity of genomic regulations between cancer cases. Results: To test the enhanced COXEN method, we used a lightGBM regression model to predict drug response based on the selected genes on two benchmark in vitro drug screening datasets. The table below compares the performance of prediction models built based on 200 genes selected by the enhanced COXEN method to that of models built on 200 genes randomly picked from the LINCS gene set, which includes 976 “landmark” genes well-representing cellular transcriptomic changes identified in the Library of Integrated Network-Based Cellular Signatures (LINCS) project. The enhanced COXEN approach selects genes better than random LINCS genes as demonstrated by the increased average coefficient of determination (R2) for predicting the area under the dose response curve through cross-validation. Pair-wise t-test indicates the improvement is statistically significant (p-value ≤ 0.05) on both datasets. Conclusions: Our result demonstrates the benefit of using an enhanced COXEN approach to select genes for building general drug response prediction model. [Table: see text]
Collapse
Affiliation(s)
- Yitan Zhu
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | | | - Fangfang Xia
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Maulik Shukla
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | | | - Hyunseung Yoo
- Department of Energy, Argonne National Laboratory, Lemont, IL
| | - Rick L. Stevens
- Department of Energy, Argonne National Laboratory, Lemont, IL
| |
Collapse
|
11
|
Antonopoulos DA, Assaf R, Aziz RK, Brettin T, Bun C, Conrad N, Davis JJ, Dietrich EM, Disz T, Gerdes S, Kenyon RW, Machi D, Mao C, Murphy-Olson DE, Nordberg EK, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Santerre J, Shukla M, Stevens RL, VanOeffelen M, Vonstein V, Warren AS, Wattam AR, Xia F, Yoo H. PATRIC as a unique resource for studying antimicrobial resistance. Brief Bioinform 2020; 20:1094-1102. [PMID: 28968762 PMCID: PMC6781570 DOI: 10.1093/bib/bbx083] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Revised: 06/13/2017] [Indexed: 02/07/2023] Open
Abstract
The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other ‘omic’ data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Alice R Wattam
- Corresponding author: Alice R. Wattam, Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, Blacksburg, VA 24061 USA. Tel.: 540-231-1263; Fax: 540-231-2606; E-mail:
| | | | | |
Collapse
|
12
|
Amann RI, Baichoo S, Blencowe BJ, Bork P, Borodovsky M, Brooksbank C, Chain PSG, Colwell RR, Daffonchio DG, Danchin A, de Lorenzo V, Dorrestein PC, Finn RD, Fraser CM, Gilbert JA, Hallam SJ, Hugenholtz P, Ioannidis JPA, Jansson JK, Kim JF, Klenk HP, Klotz MG, Knight R, Konstantinidis KT, Kyrpides NC, Mason CE, McHardy AC, Meyer F, Ouzounis CA, Patrinos AAN, Podar M, Pollard KS, Ravel J, Muñoz AR, Roberts RJ, Rosselló-Móra R, Sansone SA, Schloss PD, Schriml LM, Setubal JC, Sorek R, Stevens RL, Tiedje JM, Turjanski A, Tyson GW, Ussery DW, Weinstock GM, White O, Whitman WB, Xenarios I. Consent insufficient for data release-Response. Science 2019; 364:446. [PMID: 31048484 DOI: 10.1126/science.aax7509] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R, Stevens RL, Tyson GH, Zhao S, Davis JJ. Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella. J Clin Microbiol 2019; 57:e01260-18. [PMID: 30333126 PMCID: PMC6355527 DOI: 10.1128/jcm.01260-18] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/25/2018] [Indexed: 11/20/2022] Open
Abstract
Nontyphoidal Salmonella species are the leading bacterial cause of foodborne disease in the United States. Whole-genome sequences and paired antimicrobial susceptibility data are available for Salmonella strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15 years in the United States, was used to generate extreme gradient boosting (XGBoost)-based machine learning models for predicting MICs for 15 antibiotics. The MIC prediction models had an overall average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7% (confidence interval, 2.4% to 3.0%), and an average major error rate of 0.1% (confidence interval, 0.1% to 0.2%). The model predicted MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for the training sets, we show that highly accurate MIC prediction models can be generated with less than 500 genomes. We also show that our approach for predicting MICs is stable over time, despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole-genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.
Collapse
Affiliation(s)
- Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA
| | - S Wesley Long
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA
| | - Patrick F McDermott
- U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA
| | - Randall J Olsen
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA
- Department of Computer Science, University of Chicago, Chicago, Illinois, USA
| | - Gregory H Tyson
- U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA
| | - Shaohua Zhao
- U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA
| |
Collapse
|
14
|
Amann RI, Baichoo S, Blencowe BJ, Bork P, Borodovsky M, Brooksbank C, Chain PSG, Colwell RR, Daffonchio DG, Danchin A, de Lorenzo V, Dorrestein PC, Finn RD, Fraser CM, Gilbert JA, Hallam SJ, Hugenholtz P, Ioannidis JPA, Jansson JK, Kim JF, Klenk HP, Klotz MG, Knight R, Konstantinidis KT, Kyrpides NC, Mason CE, McHardy AC, Meyer F, Ouzounis CA, Patrinos AAN, Podar M, Pollard KS, Ravel J, Muñoz AR, Roberts RJ, Rosselló-Móra R, Sansone SA, Schloss PD, Schriml LM, Setubal JC, Sorek R, Stevens RL, Tiedje JM, Turjanski A, Tyson GW, Ussery DW, Weinstock GM, White O, Whitman WB, Xenarios I. Toward unrestricted use of public genomic data. Science 2019; 363:350-352. [PMID: 30679363 DOI: 10.1126/science.aaw1280] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Publication interests should not limit access to public data
Collapse
|
15
|
Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia JM, Chia JM, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, Yu D. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 2018; 36:566-569. [PMID: 29979655 PMCID: PMC6870991 DOI: 10.1038/nbt.4163] [Citation(s) in RCA: 676] [Impact Index Per Article: 112.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Adam P Arkin
- Department of Bioengineering, University of California, Berkeley, California, USA.,Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Robert W Cottingham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Christopher S Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Nomi L Harris
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Rick L Stevens
- Computer Science Department and Computation Institute, University of Chicago, Chicago, Illinois, USA.,Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Sergei Maslov
- Biology Department, Brookhaven National Laboratory, Upton, New York, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Paramvir Dehal
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Fernando Perez
- Computational Research Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA.,Berkeley Institute for Data Science, University of California, Berkeley, California, USA.,Department of Statistics, University of California, Berkeley, California, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Shane Canon
- National Energy Research Scientific Computing Center, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Michael W Sneddon
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Matthew L Henderson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - William J Riehl
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Dan Murphy-Olson
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Stephen Y Chan
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Roy T Kamimura
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Meghan M Drake
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Thomas S Brettin
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Elizabeth M Glass
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Dylan Chivian
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Dan Gunter
- Computational Research Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - David J Weston
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Benjamin H Allen
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Jason Baumohl
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Aaron A Best
- Department of Biology, Hope College, Holland, Michigan, USA
| | - Ben Bowen
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California, USA
| | - Christopher C Bun
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - John-Marc Chandonia
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Jer-Ming Chia
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Ric Colasanti
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Neal Conrad
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - James J Davis
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Brian H Davison
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Matthew DeJongh
- Department of Computer Science, Hope College, Holland, Michigan, USA
| | - Scott Devoid
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Emily Dietrich
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Inna Dubchak
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Janaka N Edirisinghe
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA.,Computation Institute, University of Chicago, Chicago, Illinois, USA
| | - Gang Fang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - José P Faria
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Paul M Frybarger
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Wolfgang Gerlach
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Annette Greiner
- National Energy Research Scientific Computing Center, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - James Gurtowski
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Holly L Haun
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Fei He
- Biology Department, Brookhaven National Laboratory, Upton, New York, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Rashmi Jain
- Department of Plant Pathology and Genome Center, University of California, Davis, Davis, California, USA.,Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Marcin P Joachimiak
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Kevin P Keegan
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Shinnosuke Kondo
- Department of Computer Science, Hope College, Holland, Michigan, USA
| | - Vivek Kumar
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Miriam L Land
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Folker Meyer
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Marissa Mills
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Pavel S Novichkov
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Taeyun Oh
- Department of Plant Pathology and Genome Center, University of California, Davis, Davis, California, USA.,Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Gary J Olsen
- Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Robert Olson
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Bruce Parrello
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Shiran Pasternak
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Erik Pearson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Sarah S Poon
- Computational Research Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Gavin A Price
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Srividya Ramakrishnan
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Priya Ranjan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.,Department of Plant Sciences, University of Tennessee, Knoxville, Tennessee, USA
| | - Pamela C Ronald
- Department of Plant Pathology and Genome Center, University of California, Davis, Davis, California, USA.,Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Michael C Schatz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Samuel M D Seaver
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Maulik Shukla
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Roman A Sutormin
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Mustafa H Syed
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - James Thomason
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Nathan L Tintle
- Department of Mathematics, Hope College, Holland, Michigan, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Daifeng Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| | - Fangfang Xia
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Hyunseung Yoo
- Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA
| | - Shinjae Yoo
- Computer Science and Math, Computer Science Initiative, Brookhaven National Laboratory, Upton, New York, USA
| | - Dantong Yu
- Computer Science and Math, Computer Science Initiative, Brookhaven National Laboratory, Upton, New York, USA.,Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA (S.M.); Department of Statistics, University of California, Berkeley, California, USA (F.P.); New York University Shanghai Campus, Pudong, Shanghai, China (G.F.); Department of Plant Pathology, Kansas State University, Manhattan, Kansas, USA (F.H.); Insilicogen. Inc., Giheung-gu, Yongin-si, Gyeonggi-do, Korea (T.O.); Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA (S.R., M.C.S.); Memorial Sloan Kettering Cancer Center, New York, New York, USA (M.H.S.); Dordt College, Sioux Center, Iowa, USA (N.L.T.); Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA (D.W.); Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, New Jersey, USA (D.Y.)
| |
Collapse
|
16
|
Xia F, Shukla M, Brettin T, Garcia-Cardona C, Cohn J, Allen JE, Maslov S, Holbeck SL, Doroshow JH, Evrard YA, Stahlberg EA, Stevens RL. Predicting tumor cell line response to drug pairs with deep learning. BMC Bioinformatics 2018; 19:486. [PMID: 30577754 PMCID: PMC6302446 DOI: 10.1186/s12859-018-2509-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.
Collapse
Affiliation(s)
- Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA. .,Computation Institute, The University of Chicago, Chicago, IL, USA.
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA
| | | | - Judith Cohn
- Computer Science, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jonathan E Allen
- Computation Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Sergei Maslov
- Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Susan L Holbeck
- Developmental Therapeutics Branch, National Cancer Institute, Frederick, MD, USA
| | - James H Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Frederick, MD, USA
| | - Yvonne A Evrard
- Developmental Therapeutics Branch, National Cancer Institute, Frederick, MD, USA
| | - Eric A Stahlberg
- Data Science and Information Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA.,Computation Institute, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
17
|
Nguyen M, Brettin T, Long SW, Musser JM, Olsen RJ, Olson R, Shukla M, Stevens RL, Xia F, Yoo H, Davis JJ. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci Rep 2018; 8:421. [PMID: 29323230 PMCID: PMC5765115 DOI: 10.1038/s41598-017-18972-w] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 12/12/2017] [Indexed: 12/20/2022] Open
Abstract
Antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ±1 two-fold dilution factor, is 92%. Individual accuracies are ≥90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.
Collapse
Affiliation(s)
- Marcus Nguyen
- Northern Illinois University, Computation Science, DeKalb, IL, 60115, USA.,University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - Thomas Brettin
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - S Wesley Long
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, 77030, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
| | - James M Musser
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, 77030, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Randall J Olsen
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, 77030, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Robert Olson
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - Maulik Shukla
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - Rick L Stevens
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA.,University of Chicago, Department of Computer Science, Chicago, IL, 60439, USA
| | - Fangfang Xia
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - Hyunseung Yoo
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA.,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA
| | - James J Davis
- University of Chicago, Computation Institute, Chicago, IL, 60637, USA. .,Argonne National Laboratory, Computing Environment and Life Sciences, Argonne, IL, 60439, USA.
| |
Collapse
|
18
|
Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R. A communal catalogue reveals Earth's multiscale microbial diversity. Nature 2017; 551:457-463. [PMID: 29088705 PMCID: PMC6192678 DOI: 10.1038/nature24621] [Citation(s) in RCA: 1219] [Impact Index Per Article: 174.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 10/10/2017] [Indexed: 02/07/2023]
Abstract
Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.
Collapse
Affiliation(s)
- Luke R Thompson
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Biological Sciences and Northern Gulf Institute, University of Southern Mississippi, Hattiesburg, Mississippi, USA.,Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, stationed at Southwest Fisheries Science Center, La Jolla, California, USA
| | - Jon G Sanders
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Amnon Amir
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Joshua Ladau
- The Gladstone Institutes and University of California San Francisco, San Francisco, California, USA
| | - Kenneth J Locey
- Department of Biology, Indiana University, Bloomington, Indiana, USA
| | - Robert J Prill
- Industrial and Applied Genomics, IBM Almaden Research Center, San Jose, California, USA
| | - Anupriya Tripathi
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Division of Biological Sciences, University of California San Diego, La Jolla, California, USA.,Skaggs School of Pharmacy, University of California San Diego, La Jolla, California, USA
| | - Sean M Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Gail Ackermann
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Jose A Navas-Molina
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Stefan Janssen
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Evguenia Kopylova
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Yoshiki Vázquez-Baeza
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Antonio González
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - James T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Zhenjiang Zech Xu
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Lingjing Jiang
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Family Medicine and Public Health, University of California San Diego, La Jolla, California, USA
| | - Mohamed F Haroon
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Jad Kanbar
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Se Jin Song
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Nicholas A Bokulich
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Joshua Lefler
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Colin J Brislawn
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Gregory Humphrey
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Sarah M Owens
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Jarrad Hampton-Marcell
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Biological Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Donna Berg-Lyons
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, USA
| | - Valerie McKenzie
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA
| | - Noah Fierer
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA.,Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Aaron Clauset
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, USA.,Department of Computer Science, University of Colorado, Boulder, Colorado, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Computer Science, University of Chicago, Chicago, Illinois, USA
| | - Ashley Shade
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA.,Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA.,Program in Ecology, Evolutionary Biology and Behavior, Michigan State University, East Lansing, Michigan, USA
| | - Katherine S Pollard
- The Gladstone Institutes and University of California San Francisco, San Francisco, California, USA
| | - Kelly D Goodwin
- Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, stationed at Southwest Fisheries Science Center, La Jolla, California, USA
| | - Janet K Jansson
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jack A Gilbert
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Surgery, University of Chicago, Chicago, Illinois, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California, USA
| | | |
Collapse
|
19
|
Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, Conrad N, Dietrich EM, Disz T, Gabbard JL, Gerdes S, Henry CS, Kenyon RW, Machi D, Mao C, Nordberg EK, Olsen GJ, Murphy-Olson DE, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Vonstein V, Warren A, Xia F, Yoo H, Stevens RL. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 2016; 45:D535-D542. [PMID: 27899627 PMCID: PMC5210524 DOI: 10.1093/nar/gkw1017] [Citation(s) in RCA: 1036] [Impact Index Per Article: 129.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 10/14/2016] [Accepted: 11/09/2016] [Indexed: 12/14/2022] Open
Abstract
The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.
Collapse
Affiliation(s)
- Alice R Wattam
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - James J Davis
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Rida Assaf
- Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| | | | - Thomas Brettin
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Christopher Bun
- Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| | - Neal Conrad
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Terry Disz
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Joseph L Gabbard
- Grado Department of Industrial & Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA
| | - Svetlana Gerdes
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Christopher S Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Dustin Machi
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Chunhong Mao
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Eric K Nordberg
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Gary J Olsen
- Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Daniel E Murphy-Olson
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Robert Olson
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Ross Overbeek
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Bruce Parrello
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Maulik Shukla
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | - Andrew Warren
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Fangfang Xia
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Hyunseung Yoo
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Rick L Stevens
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
20
|
Faria JP, Davis JJ, Edirisinghe JN, Taylor RC, Weisenhorn P, Olson RD, Stevens RL, Rocha M, Rocha I, Best AA, DeJongh M, Tintle NL, Parrello B, Overbeek R, Henry CS. Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation. Front Microbiol 2016; 7:1819. [PMID: 27933038 PMCID: PMC5121216 DOI: 10.3389/fmicb.2016.01819] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 10/28/2016] [Indexed: 01/13/2023] Open
Abstract
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.
Collapse
Affiliation(s)
- José P Faria
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA; Centre of Biological Engineering, University of Minho, Campus de GualtarBraga, Portugal; Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL, USA
| | - James J Davis
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA
| | - Janaka N Edirisinghe
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA
| | - Ronald C Taylor
- Computational Biology and Bioinformatics Group, Pacific Northwest National Laboratory (U.S. Dept. of Energy) Richland, WA, USA
| | - Pamela Weisenhorn
- Mathematics and Computer Science Division, Argonne National Laboratory Argonne, IL, USA
| | - Robert D Olson
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA
| | - Rick L Stevens
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA; Department of Computer Science, Ryerson Physical Laboratory, University of ChicagoChicago, IL, USA
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Campus de Gualtar Braga, Portugal
| | - Isabel Rocha
- Centre of Biological Engineering, University of Minho, Campus de Gualtar Braga, Portugal
| | - Aaron A Best
- Biology Department, Hope College Holland, MI, USA
| | | | - Nathan L Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College Sioux Center, IA, USA
| | - Bruce Parrello
- Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA; Fellowship for Interpretation of GenomesBurr Ridge, IL, USA
| | - Ross Overbeek
- Computation Institute, University of ChicagoChicago, IL, USA; Computing, Environment and Life Sciences, Argonne National LaboratoryArgonne, IL, USA; Fellowship for Interpretation of GenomesBurr Ridge, IL, USA
| | - Christopher S Henry
- Computation Institute, University of ChicagoChicago, IL, USA; Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL, USA
| |
Collapse
|
21
|
Edirisinghe JN, Weisenhorn P, Conrad N, Xia F, Overbeek R, Stevens RL, Henry CS. Modeling central metabolism and energy biosynthesis across microbial life. BMC Genomics 2016; 17:568. [PMID: 27502787 PMCID: PMC4977884 DOI: 10.1186/s12864-016-2887-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 07/06/2016] [Indexed: 12/22/2022] Open
Abstract
Background Automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. Results To overcome this challenge, we developed methods and tools (http://coremodels.mcs.anl.gov) to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of model organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. Conclusions We predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2887-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Janaka N Edirisinghe
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA.,Computer Science Department and Computation Institute, University of Chicago, 5640, South Ellis Avenue, Chicago, IL, 60637, USA
| | - Pamela Weisenhorn
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA
| | - Neal Conrad
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA
| | - Fangfang Xia
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA.,Computer Science Department and Computation Institute, University of Chicago, 5640, South Ellis Avenue, Chicago, IL, 60637, USA
| | - Ross Overbeek
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA
| | - Rick L Stevens
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA.,Computer Science Department and Computation Institute, University of Chicago, 5640, South Ellis Avenue, Chicago, IL, 60637, USA
| | - Christopher S Henry
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL, 60439, USA. .,Computer Science Department and Computation Institute, University of Chicago, 5640, South Ellis Avenue, Chicago, IL, 60637, USA.
| |
Collapse
|
22
|
Dugan VG, Emrich SJ, Giraldo-Calderón GI, Harb OS, Newman RM, Pickett BE, Schriml LM, Stockwell TB, Stoeckert CJ, Sullivan DE, Singh I, Ward DV, Yao A, Zheng J, Barrett T, Birren B, Brinkac L, Bruno VM, Caler E, Chapman S, Collins FH, Cuomo CA, Di Francesco V, Durkin S, Eppinger M, Feldgarden M, Fraser C, Fricke WF, Giovanni M, Henn MR, Hine E, Hotopp JD, Karsch-Mizrachi I, Kissinger JC, Lee EM, Mathur P, Mongodin EF, Murphy CI, Myers G, Neafsey DE, Nelson KE, Nierman WC, Puzak J, Rasko D, Roos DS, Sadzewicz L, Silva JC, Sobral B, Squires RB, Stevens RL, Tallon L, Tettelin H, Wentworth D, White O, Will R, Wortman J, Zhang Y, Scheuermann RH. Standardized metadata for human pathogen/vector genomic sequences. PLoS One 2014; 9:e99979. [PMID: 24936976 PMCID: PMC4061050 DOI: 10.1371/journal.pone.0099979] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 05/15/2014] [Indexed: 11/18/2022] Open
Abstract
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
Collapse
Affiliation(s)
- Vivien G. Dugan
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Scott J. Emrich
- University of Notre Dame, Notre Dame, Indiana, United States of America
| | | | - Omar S. Harb
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Ruchi M. Newman
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Brett E. Pickett
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Lynn M. Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Timothy B. Stockwell
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | | | - Dan E. Sullivan
- Cyberinfrastructure Division, Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America
| | - Indresh Singh
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Doyle V. Ward
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Alison Yao
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Tanya Barrett
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, United States of America
| | - Bruce Birren
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Lauren Brinkac
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Vincent M. Bruno
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Elizabet Caler
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Sinéad Chapman
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Frank H. Collins
- University of Notre Dame, Notre Dame, Indiana, United States of America
| | | | - Valentina Di Francesco
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Scott Durkin
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Mark Eppinger
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | | | - Claire Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - W. Florian Fricke
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Maria Giovanni
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Matthew R. Henn
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Erin Hine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Julie Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Ilene Karsch-Mizrachi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, United States of America
| | | | - Eun Mi Lee
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Punam Mathur
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Emmanuel F. Mongodin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Cheryl I. Murphy
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Garry Myers
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | | | - Karen E. Nelson
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - William C. Nierman
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Julia Puzak
- Kelly Government Solutions, Rockville, Maryland, United States of America
| | - David Rasko
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - David S. Roos
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Joana C. Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Bruno Sobral
- Cyberinfrastructure Division, Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America
| | - R. Burke Squires
- National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America
| | - Rick L. Stevens
- Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Luke Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Herve Tettelin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - David Wentworth
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Owen White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Rebecca Will
- Cyberinfrastructure Division, Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America
| | - Jennifer Wortman
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yun Zhang
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
| | - Richard H. Scheuermann
- J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America
- Department of Pathology, University of California San Diego, San Diego, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Prieto-García A, Castells MC, Stevens RL. Mast cell-derived htryptase-beta functions as a potent anticoagulant by proteolytically damaging fibrinogen. J Investig Allergol Clin Immunol 2014; 24:286-287. [PMID: 25219118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023] Open
|
24
|
Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, Gillespie JJ, Gough R, Hix D, Kenyon R, Machi D, Mao C, Nordberg EK, Olson R, Overbeek R, Pusch GD, Shukla M, Schulman J, Stevens RL, Sullivan DE, Vonstein V, Warren A, Will R, Wilson MJC, Yoo HS, Zhang C, Zhang Y, Sobral BW. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 2013; 42:D581-91. [PMID: 24225323 PMCID: PMC3965095 DOI: 10.1093/nar/gkt1099] [Citation(s) in RCA: 873] [Impact Index Per Article: 79.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
Collapse
Affiliation(s)
- Alice R Wattam
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24060, USA, Computation Institute, University of Chicago, Chicago, IL 60637, USA, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60637, USA, Grado Department of Industrial & Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA, Computing, Environment, and Life Sciences, Argonne National Laboratory, Argonne, IL 60637, USA and Nestlé Institute of Health Sciences SA, Campus EPFL, Quartier de L'innovation, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Tanaka K, Henry CS, Zinner JF, Jolivet E, Cohoon MP, Xia F, Bidnenko V, Ehrlich SD, Stevens RL, Noirot P. Building the repertoire of dispensable chromosome regions in Bacillus subtilis entails major refinement of cognate large-scale metabolic model. Nucleic Acids Res 2012; 41:687-99. [PMID: 23109554 PMCID: PMC3592452 DOI: 10.1093/nar/gks963] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The nonessential regions in bacterial chromosomes are ill-defined due to incomplete functional information. Here, we establish a comprehensive repertoire of the genome regions that are dispensable for growth of Bacillus subtilis in a variety of media conditions. In complex medium, we attempted deletion of 157 individual regions ranging in size from 2 to 159 kb. A total of 146 deletions were successful in complex medium, whereas the remaining regions were subdivided to identify new essential genes (4) and coessential gene sets (7). Overall, our repertoire covers ∼76% of the genome. We screened for viability of mutant strains in rich defined medium and glucose minimal media. Experimental observations were compared with predictions by the iBsu1103 model, revealing discrepancies that led to numerous model changes, including the large-scale application of model reconciliation techniques. We ultimately produced the iBsu1103V2 model and generated predictions of metabolites that could restore the growth of unviable strains. These predictions were experimentally tested and demonstrated to be correct for 27 strains, validating the refinements made to the model. The iBsu1103V2 model has improved considerably at predicting loss of viability, and many insights gained from the model revisions have been integrated into the Model SEED to improve reconstruction of other microbial models.
Collapse
Affiliation(s)
- Kosei Tanaka
- INRA, UMR 1319 Micalis, AgroParisTech, UMR Micalis, Jouy-en-Josas F-78350, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
With recent breakthroughs in experimental microbiology making it possible to synthesize and implant an entire genome to create a living cell, the challenge of constructing a working blueprint for the first truly minimal synthetic organism is more important than ever. Here we review the significant progress made in the design and creation of a minimal organism. We discuss how comparative genomes, gene essentiality data, naturally small genomes, and metabolic modeling are all being applied to produce a catalogue of the biological functions essential for life. We compare the minimal gene sets from three published sources with functions identified in 13 existing gene essentiality datasets. We examine how genome-scale metabolic models have been applied to design a minimal metabolism for growth in simple and complex media. Additionally, we survey the progress of efforts to construct a minimal organism, either through implementation of combinatorial deletions in Bacillus subtilis and Escherichia coli or through the synthesis and implantation of synthetic genomes.
Collapse
Affiliation(s)
- Christopher Henry
- Mathematics and Computer Science Department, Argonne National Laboratory, Argonne, IL, USA.
| | | | | |
Collapse
|
27
|
Henry CS, Zinner JF, Cohoon MP, Stevens RL. iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biol 2009; 10:R69. [PMID: 19555510 PMCID: PMC2718503 DOI: 10.1186/gb-2009-10-6-r69] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Revised: 05/18/2009] [Accepted: 06/25/2009] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Bacillus subtilis is an organism of interest because of its extensive industrial applications, its similarity to pathogenic organisms, and its role as the model organism for Gram-positive, sporulating bacteria. In this work, we introduce a new genome-scale metabolic model of B. subtilis 168 called iBsu1103. This new model is based on the annotated B. subtilis 168 genome generated by the SEED, one of the most up-to-date and accurate annotations of B. subtilis 168 available. RESULTS The iBsu1103 model includes 1,437 reactions associated with 1,103 genes, making it the most complete model of B. subtilis available. The model also includes Gibbs free energy change (DeltarG' degrees ) values for 1,403 (97%) of the model reactions estimated by using the group contribution method. These data were used with an improved reaction reversibility prediction method to identify 653 (45%) irreversible reactions in the model. The model was validated against an experimental dataset consisting of 1,500 distinct conditions and was optimized by using an improved model optimization method to increase model accuracy from 89.7% to 93.1%. CONCLUSIONS Basing the iBsu1103 model on the annotations generated by the SEED significantly improved the model completeness and accuracy compared with the most recent previously published model. The enhanced accuracy of the iBsu1103 model also demonstrates the efficacy of the improved reaction directionality prediction method in accurately identifying irreversible reactions in the B. subtilis metabolism. The proposed improved model optimization methodology was also demonstrated to be effective in minimally adjusting model content to improve model accuracy.
Collapse
Affiliation(s)
- Christopher S Henry
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL 60439, USA
| | - Jenifer F Zinner
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL 60439, USA
- Computation Institute, The University of Chicago, S. Ellis Avenue, Chicago, IL 60637, USA
| | - Matthew P Cohoon
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL 60439, USA
| | - Rick L Stevens
- Mathematics and Computer Science Department, Argonne National Laboratory, S. Cass Avenue, Argonne, IL 60439, USA
- Computation Institute, The University of Chicago, S. Ellis Avenue, Chicago, IL 60637, USA
| |
Collapse
|
28
|
Abstract
Proteoglycan research on cells that participate in immune responses has progressed from the early novel finding that heparin proteoglycans are present in the secretory granules of the connective tissue mast cell to the more recent findings that mucosal mast cells and natural killer (NK) cells possess chondroitin sulphate proteoglycans in their granules. Characterization studies of these intracellular proteoglycans have revealed that they all possess peptide cores which are very resistant to proteolytic degradation. Their glycosaminoglycans, however, differ in such parameters as the type of hexosamine, location of sulphation degree of sulphation, or extent of epimerization of the uronic acid. Amino acid compositional analyses of heparin proteoglycans from rat connective tissue mast cells and chondroitin sulphate E proteoglycans from mouse mucosal mast cells indicate that their peptide cores are homologous to, but possibly distinct from one another. It is not yet known if these differences reflect a species variation, are due to different post-translational proteolytic processing, or are the result of expression of distinct genes coding for different peptide cores. The proteoglycans of mast cells and natural killer cells are packaged in the granules with cationic proteins. In mast cells these proteins have been shown to be serine proteases, and when bound to the acidic proteoglycans their enzymic activity is inhibited. Since the type of glycosaminoglycan linked to the proteoglycan has been found to be a characteristic of that cell, the structure of the cell-associated proteoglycan has become one of the markers used to distinguish cells phenotypically. By following the expression of different proteoglycans during differentiation, the relationship of the two subclasses of mast cells has been determined.
Collapse
|
29
|
Abstract
Simulations of large neural networks have the potential to contribute uniquely to the study of epilepsy, from the effects of extremely local changes in neuron environment and behavior, to the effects of large scale wiring anomalies. Currently, simulations with sufficient detail in the neuron model, however, are limited to cell counts that are far smaller than scales measured by typical probes. Furthermore, it is likely that future simulations will follow the path that large-scale simulations in other fields have and include hierarchically interacting components covering different scales and different biophysics. The resources needed for problem solving in this domain call for petascale computing--computing with supercomputers capable of 10(15) operations a second and holding datasets of 10(15) bytes in memory. We will lay out the structure of our simulation of epileptiform electrical activity in the neocortex, describe experiments and models of its scaling behavior in large cluster supercomputers, identify tight spots in this behavior, and project the performance onto a candidate next generation computing platform.
Collapse
Affiliation(s)
- M Hereld
- Futures Laboratory, Mathematics and Computer Science, Argonne National Laboratory, Argonne, IL, USA
| | | | | | | |
Collapse
|
30
|
van Drongelen W, Lee HC, Koch H, Elsen F, Carroll MS, Hereld M, Stevens RL. Interaction between cellular voltage-sensitive conductance and network parameters in a model of neocortex can generate epileptiform bursting. Conf Proc IEEE Eng Med Biol Soc 2007; 2004:4003-5a. [PMID: 17271176 DOI: 10.1109/iembs.2004.1404118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We examined the effects of both intrinsic neuronal membrane properties and network parameters on oscillatory activity in a model of neocortex. A scalable network model with six different cell types was built with the pGENESIS neural simulator. The neocortical network consisted of two types of pyramidal cells and four types of inhibitory interneurons. All cell types contained both fast sodium and delayed rectifier potassium channels for generation of action potentials. A subset of the pyramidal neurons contained an additional slow inactivating (persistent) sodium current (NaP). The neurons with the NaP current showed spontaneous bursting activity in the absence of external stimulation. The model also included a routine to calculate a simulated electroencephalogram (EEG) trace from the population activity. This revealed emergent network behavior which ranged from desynchronized activity to different types of seizure-like bursting patterns. At settings with weaker excitatory network effects, the propensity to generate seizure-like behavior increased. Strong excitatory network connectivity destroyed oscillatory behavior, whereas weak connectivity enhanced the relative importance of the spontaneously bursting cells. Our findings are in contradiction with the general opinion that strong excitatory synaptic and/or insufficient inhibition effects are associated with seizure initiation, but are in agreement with previously reported behavior in neocortex.
Collapse
Affiliation(s)
- W van Drongelen
- Department of Pediatrics, The University of Chicago, Chicago, IL, USA
| | | | | | | | | | | | | |
Collapse
|
31
|
Hereld M, Lee HC, van Drongelen W, Stevens RL. Image-based configuration and interaction for large neural network simulations. BMC Neurosci 2007. [PMCID: PMC4436188 DOI: 10.1186/1471-2202-8-s2-p22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
32
|
Mil-Homens M, Stevens RL, Cato I, Abrantes F. Regional geochemical baselines for Portuguese shelf sediments. Environ Pollut 2007; 148:418-27. [PMID: 17280758 DOI: 10.1016/j.envpol.2006.12.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Revised: 11/28/2006] [Accepted: 12/11/2006] [Indexed: 05/13/2023]
Abstract
Metal concentrations (Al, Cr, Cu, Ni, Pb and Zn) from the DGM-INETI archive data set have been examined for sediments collected during the 1970s from 267 sites on the Portuguese shelf. Due to the differences in the oceanographic and sedimentological settings between western and Algarve coasts, the archive data set is split in two segments. For both shelf segments, regional geochemical baselines (RGB) are defined using aluminium as a reference element. Seabed samples recovered in 2002 from four distinct areas of the Portuguese shelf are superimposed on these models to identify and compare possible metal enrichments relative to the natural distribution. Metal enrichments associated with anthropogenic influences are identified in three samples collected nearby the Tejo River and are characterised by the highest enrichment factors (EF; EF(Pb)<3, EF(Zn)<4). EF values close to 1 suggest a largely natural origin for metal distributions in sediments from the other areas included in the study.
Collapse
Affiliation(s)
- M Mil-Homens
- Departamento de Geologia Marinha, Instituto Nacional de Engenharia, Tecnologia e Inovação, I.P., Estrada da Portela, Apartado 7586, 2721-866 Alfragide, Portugal.
| | | | | | | |
Collapse
|
33
|
Benayoun M, Dwyer J, Lee HC, Herald M, Stevens RL, van Drongelen W. Simulated-annealing as a tool to identify parameter values associated with epileptiform activity in single-neuron and network compartmental models. BMC Neurosci 2007. [PMCID: PMC4434830 DOI: 10.1186/1471-2202-8-s2-p23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
34
|
Abstract
SUMMARY Large simulations have become increasingly complex in many fields, tending to incorporate scale-dependent modeling and algorithms and wide-ranging physical influences. This scale of simulation sophistication has not yet been matched in neuroscience. The authors describe a framework aimed at enabling natural interaction with complex simulations: their configuration, initial conditions, monitoring, and analysis. The architecture is built on three cornerstone components: active probes, adaptive data capture, and visual interface. The resulting synthesis will enable interactive exploration of live simulations running on supercomputing platforms.
Collapse
Affiliation(s)
- Mark Hereld
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60637-1470, USA.
| | | | | | | |
Collapse
|
35
|
Abstract
SUMMARY Seizures in pediatric epilepsy are often associated with spreading, repetitive bursting activity in neocortex. The authors examined onset and propagation of seizure-like activity using a computational model of cortical circuitry. The model includes two pyramidal cell types and four types of inhibitory interneurons. Each neuron is represented by a multicompartmental model with biophysically realistic ion channels. The authors determined the role of bursting neurons and found that their capability of driving network oscillations is most prominent in networks with either weak or relatively strong excitatory synaptic coupling. Synaptic coupling strength was varied in a separate set of simulations to examine its role in network bursting. Oscillations both between cortical layers (vertical oscillations) and between cortical areas (horizontal oscillations) emerge at moderate excitatory coupling strengths. For horizontal propagation, existence of a fast-conducting fiber system and its properties are critical. Seizure-like oscillatory activity may originate from single neurons or small networks, and that activity may propagate in two principal fashions: one that can be represented by a unidirectional (pacemaker)-type process and the other as multi- or bidirectional propagating waves. The frequency of the bursting patterns relates to underlying propagating activity that can either sustain or disrupt the ongoing oscillation.
Collapse
Affiliation(s)
- Wim van Drongelen
- Department of Pediatrics, The University of Chicago, Chicago, Illinois 60637-1470, USA.
| | | | | | | |
Collapse
|
36
|
Mil-Homens M, Stevens RL, Boer W, Abrantes F, Cato I. Pollution history of heavy metals on the Portuguese shelf using 210Pb-geochronology. Sci Total Environ 2006; 367:466-80. [PMID: 16701790 DOI: 10.1016/j.scitotenv.2006.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 03/24/2006] [Accepted: 03/28/2006] [Indexed: 05/09/2023]
Abstract
Although high energy shelves are usually ignored in environmental studies, the fine fractions of sandy deposits and the restricted areas of silty clayey deposits record contaminant loading history and can represent important components for understanding processes and fluxes in a system perspective. The main aim of this work is identify trends in historical pollution in three accumulation areas of the western Portuguese shelf that are characterised by different oceanographic and sedimentologic conditions. The vertical distribution of major (Al, Ca, Fe, Mg, Mn and S) and trace elements (Cr, Cu, Li, Ni, Pb, Sc, Sr and Zn), (210)Pb and the fine fraction contents, are documented. The (210)Pb distributions with depth confirm recent accumulation in the study areas and provide a chronologic basis. Factor analysis is used to classify the number of variables into detrital, biogenic and anthropogenic factors that may reflect common metal sources or sedimentary processes. Related to both bioturbation and hydrodynamic processes occurring at water-depths greater than 100 m, the northern Ave-Douro area has a 5-7 cm mixed-layer at the surface affecting the deposition signal. In the Lis area, on the central shelf, heavy metal contents normalised to aluminium indicate slight anthropogenic enrichment in Pb and Zn contents since the beginning of the 20th century and higher levels from the 1950s until the present. These historical trends can reflect changes in the industrial activity and in the combustion of leaded gasoline. Down-core profiles from the southern Mira area reveal metal enrichments that may be caused by early diagenetic remobilisation and precipitation. The use of dated profiles extending across the record of industrial development allows both enrichment factors and excess (anthropogenic) metal fluxes to be compared with historical changes.
Collapse
Affiliation(s)
- M Mil-Homens
- Departamento de Geologia Marinha, Instituto Nacional de Engenharia, Tecnologia e Inovação, Estrada da Portela, Apartado 7586, 2721-866 Alfragide, Portugal.
| | | | | | | | | |
Collapse
|
37
|
van Drongelen W, Koch H, Elsen FP, Lee HC, Mrejeru A, Doren E, Marcuccilli CJ, Hereld M, Stevens RL, Ramirez JM. Role of persistent sodium current in bursting activity of mouse neocortical networks in vitro. J Neurophysiol 2006; 96:2564-77. [PMID: 16870839 DOI: 10.1152/jn.00446.2006] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Most types of electrographic epileptiform activity can be characterized by isolated or repetitive bursts in brain electrical activity. This observation is our motivation to determine mechanisms that underlie bursting behavior of neuronal networks. Here we show that the persistent sodium (Na(P)) current in mouse neocortical slices is associated with cellular bursting and our data suggest that these cells are capable of driving networks into a bursting state. This conclusion is supported by the following observations. 1) Both low concentrations of tetrodotoxin (TTX) and riluzole reduce and eventually stop network bursting while they simultaneously abolish intrinsic bursting properties and sensitivity levels to electrical stimulation in individual intrinsically bursting cells. 2) The sensitivity levels of regular spiking neurons are not significantly affected by riluzole or TTX at the termination of network bursting. 3) Propagation of cellular bursting in a neuronal network depended on excitatory connectivity and disappeared on bath application of CNQX (20 microM) + CPP (10 microM). 4) Voltage-clamp measurements show that riluzole (20 microM) and very low concentrations of TTX (50 nM) attenuate Na(P) currents in the neural membrane within a 1-min interval after bath application of the drug. 5) Recordings of synaptic activity demonstrate that riluzole at this concentration does not affect synaptic properties. 6) Simulations with a neocortical network model including different types of pyramidal cells, inhibitory interneurons, neurons with and without Na(P) currents, and recurrent excitation confirm the essence of our experimental observations that Na(P) conductance can be a critical factor sustaining slow population bursting.
Collapse
Affiliation(s)
- Wim van Drongelen
- Department of Pediatrics, The University of Chicago, Chicago, IL 60637-1470, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
van Drongelen W, Lee HC, Hereld M, Chen Z, Elsen FP, Stevens RL. Emergent epileptiform activity in neural networks with weak excitatory synapses. IEEE Trans Neural Syst Rehabil Eng 2005; 13:236-41. [PMID: 16003905 DOI: 10.1109/tnsre.2005.847387] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Brain electrical activity recorded during an epileptic seizure is frequently associated with rhythmic discharges in cortical networks. Current opinion in clinical neurophysiology is that strongly coupled networks and cellular bursting are prerequisites for the generation of epileptiform activity. Contrary to expectations, we found that weakly coupled cortical networks can create synchronized cellular activity and seizure-like bursting. Evaluation of a range of synaptic parameters in a detailed computational model revealed that seizure-like activity occurs when the excitatory synapses are weakened. Guided by this observation, we confirmed experimentally that, in mouse neocortical slices, a pharmacological reduction of excitatory synaptic transmission elicited sudden onset of repetitive network bursting. Our finding provides powerful evidence that onset of seizures can be associated with a reduction in synaptic transmission. These results open a new avenue to explore network synchrony and may ultimately lead to a rational approach to treatment of network pathology in epilepsy.
Collapse
Affiliation(s)
- Wim van Drongelen
- Department of Pediatrics, Computation Institute of the University of Chicago, Chicago, IL 60637-1470, USA.
| | | | | | | | | | | |
Collapse
|
39
|
van Drongelen W, C. Lee H, Hereld M, Jones D, Cohoon M, Elsen F, E. Papka M, L. Stevens R. Simulation of neocortical epileptiform activity using parallel computing. Neurocomputing 2004. [DOI: 10.1016/j.neucom.2004.01.186] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
40
|
Whitelegge JP, Ahn V, Norris AJ, Sung H, Waring A, Stevens RL, Fluharty CB, Prive G, Faull KF, Fluharty AL. Characterization of a recombinant molecule covalently indistinguishable from human cerebroside-sulfate activator protein (CSAct or Saposin B). Cell Mol Biol (Noisy-le-grand) 2003; 49:799-807. [PMID: 14528917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Humans deficient in the cerebroside-sulfate activator protein (CSAct or Saposin B) are unable to catabolize sulfatide and other glycosphingolipids leading to their accumulation and neurodegenerative disease. Clinically this usually manifests as a form of metachromatic leukodystrophy (MLD). CSAct is a small water-soluble glycoprotein that apparently functions in the lysosome to solubilize sulfatide and other lipids enabling their interaction with soluble lysosomal hydrolases. CSAct activity can be measured in vitro by assay of its ability to activate sulfatide-sulfate hydrolysis by arylsulfatase A or ex vivo by its ability to functionally complement CSAct deficient fibroblast cell lines derived from MLD patients. A recombinant form of CSAct has been expressed in E. coli and processed in vitro to a form covalently indistinguishable from deglycosylated human CSAct isolated from human urine. Size-exclusion chromatography in combination with multi-angle laser-light scattering (SEC-MALLS) measurements demonstrate that both native and recombinant forms of the molecule behave as a dimer in the pH range 7.0-4.5. The CSAct activity assay showed that both recombinant and deglycosylated human urine CSAct efficiently activated sulfatide sulfate hydrolysis and provided functional complementation of CSAct-deficient cells. However, a D21N mutant form of recombinant CSAct could not functionally complement these cells despite full activity in the in vitro assay. It is concluded that while glycosylation is unnecessary for in vitro and ex vivo activity of CSAct, modification of the native N21 is necessary to prevent loss of ex vivo activity, possibly via protection from degradation.
Collapse
Affiliation(s)
- J P Whitelegge
- The Pasarow Mass Spectrometry Laboratory, Department of Psychiatry and Biobehavioral Sciences, The Neuropsychiatric Institute, University of California, 405 Hilgard Ave., Los Angeles, CA 90095, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Wong GW, Yasuda S, Madhusudhan MS, Li L, Yang Y, Krilis SA, Sali A, Stevens RL. Human tryptase epsilon (PRSS22), a new member of the chromosome 16p13.3 family of human serine proteases expressed in airway epithelial cells. J Biol Chem 2001; 276:49169-82. [PMID: 11602603 DOI: 10.1074/jbc.m108677200] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Probing of the GenBank expressed sequence tag (EST) data base with varied human tryptase cDNAs identified two truncated ESTs that subsequently were found to encode overlapping portions of a novel human serine protease (designated tryptase epsilon or protease, serine S1 family member 22 (PRSS22)). The tryptase epsilon gene resides on chromosome 16p13.3 within a 2.5-Mb complex of serine protease genes. Although at least 7 of the 14 genes in this complex encode enzymatically active proteases, only one tryptase epsilon-like gene was identified. The trachea and esophagus were found to contain the highest steady-state levels of the tryptase epsilon transcript in adult humans. Although the tryptase epsilon transcript was scarce in adult human lung, it was present in abundance in fetal lung. Thus, the tryptase epsilon gene is expressed in the airways in a developmentally regulated manner that is different from that of other human tryptase genes. At the cellular level, tryptase epsilon is a major product of normal pulmonary epithelial cells, as well as varied transformed epithelial cell lines. Enzymatically active tryptase epsilon is also constitutively secreted from these cells. The amino acid sequence of human tryptase epsilon is 38-44% identical to those of human tryptase alpha, tryptase beta I, tryptase beta II, tryptase beta III, transmembrane tryptase/tryptase gamma, marapsin, and Esp-1/testisin. Nevertheless, comparative protein structure modeling and functional studies using recombinant material revealed that tryptase epsilon has a substrate preference distinct from that of its other family members. These data indicate that the products of the chromosome 16p13.3 complex of tryptase genes evolved to carry out varied functions in humans.
Collapse
Affiliation(s)
- G W Wong
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Huang C, De Sanctis GT, O'Brien PJ, Mizgerd JP, Friend DS, Drazen JM, Brass LF, Stevens RL. Evaluation of the substrate specificity of human mast cell tryptase beta I and demonstration of its importance in bacterial infections of the lung. J Biol Chem 2001; 276:26276-84. [PMID: 11335723 DOI: 10.1074/jbc.m102356200] [Citation(s) in RCA: 121] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Human pulmonary mast cells (MCs) express tryptases alpha and beta I, and both granule serine proteases are exocytosed during inflammatory events. Recombinant forms of these tryptases were generated for the first time to evaluate their substrate specificities at the biochemical level and then to address their physiologic roles in pulmonary inflammation. Analysis of a tryptase-specific, phage display peptide library revealed that tryptase beta I prefers to cleave peptides with 1 or more Pro residues flanked by 2 positively charged residues. Although recombinant tryptase beta I was unable to activate cultured cells that express different types of protease-activated receptors, the numbers of neutrophils increased >100-fold when enzymatically active tryptase beta I was instilled into the lungs of mice. In contrast, the numbers of lymphocytes and eosinophils in the airspaces did not change significantly. More important, the tryptase beta I-treated mice exhibited normal airway responsiveness. Neutrophils did not extravasate into the lungs of tryptase alpha-treated mice. Thus, this is the first study to demonstrate that the two nearly identical human MC tryptases are functionally distinct in vivo. When MC-deficient W/W(v) mice were given enzymatically active tryptase beta I or its inactive zymogen before pulmonary infection with Klebsiella pneumoniae, tryptase beta I-treated W/W(v) mice had fewer viable bacteria in their lungs relative to zymogen-treated W/W(v) mice. Because neutrophils are required to combat bacterial infections, human tryptase beta I plays a critical role in the antibacterial host defenses of the lung by recruiting neutrophils in a manner that does not alter airway reactivity.
Collapse
Affiliation(s)
- C Huang
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Li Y, Li L, Wadley R, Reddel SW, Qi JC, Archis C, Collins A, Clark E, Cooley M, Kouts S, Naif HM, Alali M, Cunningham A, Wong GW, Stevens RL, Krilis SA. Mast cells/basophils in the peripheral blood of allergic individuals who are HIV-1 susceptible due to their surface expression of CD4 and the chemokine receptors CCR3, CCR5, and CXCR4. Blood 2001; 97:3484-90. [PMID: 11369641 DOI: 10.1182/blood.v97.11.3484] [Citation(s) in RCA: 65] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A population of metachromatic cells with mast cell (MC) and basophil features was identified recently in the peripheral blood of patients with several allergic disorders. This study now shows that these metachromatic cells express on their surface the high-affinity IgE receptor (FcepsilonRI), CD4, and the chemokine receptors CCR3, CCR5, and CXCR4, but not the T-cell surface protein CD3 and the monocyte/macrophage surface protein CD68. This population of MCs/basophils can be maintained ex vivo for at least 2 weeks, and a comparable population of cells can be generated in vitro from nongranulated hematopoietic CD3(-)/CD4(+)/CD117(-) progenitors. Both populations of MCs/basophils are susceptible to an M-tropic strain of human immunodeficiency virus 1 (HIV-1). Finally, many patients with acquired immunodeficiency syndrome have HIV-1-infected MCs/basophils in their peripheral blood. Although it is well known that HIV-1 can infect CD4(+) T cells and monocytes, this finding is the first example of a human MC or basophil shown to be susceptible to the retrovirus. (Blood. 2001;97:3484-3490)
Collapse
Affiliation(s)
- Y Li
- Department of Medicine, University of New South Wales, New South Wales, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Wong GW, Li L, Madhusudhan MS, Krilis SA, Gurish MF, Rothenberg ME, Sali A, Stevens RL. Tryptase 4, a new member of the chromosome 17 family of mouse serine proteases. J Biol Chem 2001; 276:20648-58. [PMID: 11259427 DOI: 10.1074/jbc.m010422200] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Genomic blot analysis raised the possibility that uncharacterized tryptase genes reside on chromosome 17 at the complex containing the three genes that encode mouse mast cell protease (mMCP) 6, mMCP-7, and transmembrane tryptase (mTMT). Probing of GenBank's expressed sequence tag data base with these three tryptase cDNAs resulted in the identification of an expressed sequence tag that encodes a portion of a novel mouse serine protease (now designated mouse tryptase 4 (mT4) because it is the fourth member of this family). 5'- and 3'-rapid amplification of cDNA ends approaches were carried out to deduce the nucleotide sequence of the full-length mT4 transcript. This information was then used to clone its approximately 5.0-kilobase pair gene. Chromosome mapping analysis of its gene, sequence analysis of its transcript, and comparative protein structure modeling of its translated product revealed that mT4 is a new member of the chromosome 17 family of mouse tryptases. mT4 is 40-44% identical to mMCP-6, mMCP-7, and mTMT, and this new serine protease has all of the structural features of a functional tryptase. Moreover, mT4 is enzymatically active when expressed in insect cells. Due to its 17-mer hydrophobic domain at its C terminus, mT4 is a membrane-anchored tryptase more analogous to mTMT than the other members of its family. As assessed by RNA blot, reverse transcriptase-polymerase chain reaction, and/or in situ hybridization analysis, mT4 is expressed in interleukin-5-dependent mouse eosinophils, as well as in ovaries and testes. The observation that recombinant mT4 is preferentially retained in the endoplasmic reticulum of transiently transfected COS-7 cells suggests a convertase-like role for this integral membrane serine protease.
Collapse
Affiliation(s)
- G W Wong
- Department of Medicine, Brigham and Women's Hospital, Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Faull KF, Johnson J, Kim MJ, To T, Whitelegge JP, Stevens RL, Fluharty CB, Fluharty AL. Structure of the asparagine-linked sugar chains of porcine kidney and human urine cerebroside sulfate activator protein. J Mass Spectrom 2000; 35:1416-1424. [PMID: 11180632 DOI: 10.1002/1096-9888(200012)35:12<1416::aid-jms75>3.0.co;2-k] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The specific sugar residues and their linkages in the oligosaccharides from pig kidney and human urine cerebroside sulfate activator proteins (saposin B), although previously hypothesized, have been unambiguously characterized. Exhaustive sequential exoglycosidase digestion of the trimethyl-p-aminophenyl derivatives, followed by either matrix-assisted laser desorption/ionization and/or mass spectrometry, was used to define the residues and their linkages. The oligosaccharides were enzymatically released from the proteins by treatment with peptidyl-N-glycosidase F and separated from the proteins by reversed-phase high-performance liquid chromatography (HPLC). Reducing termini were converted to the trimethyl-p-aminophenyl derivative and the samples were further purified by normal-phase HPLC. The derivatized carbohydrates were then treated sequentially with a series of exoglycosidases of defined specificity, and the products of each digestion were examined by mass spectrometry. The pentasaccharides from pig kidney and human urine protein were shown to be of the asparagine-linked complex type composed of mannose-alpha 1-6-mannose-beta 1-4-N-acetylglucosamine-N-acetylglucosamine(alpha 1-6-fucose). This highly degraded structure probably represents the final product of intra-lysosomal exoglycosidase digestion. Oligosaccharide sequencing by specific exoglycosidase degradation coupled with mass spectrometry is more rapid than conventional oligosaccharide sequencing. The procedures developed will be useful for sequencing other oligosaccharides including those from other members of the lipid-binding protein class to which cerebroside sulfate activator belongs. (c) 2000 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- K F Faull
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California, 90095, USA.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Whitelegge JP, Penn B, To T, Johnson J, Waring A, Sherman M, Stevens RL, Fluharty CB, Faull KF, Fluharty AL. Methionine oxidation within the cerebroside-sulfate activator protein (CSAct or Saposin B). Protein Sci 2000; 9:1618-30. [PMID: 11045609 PMCID: PMC2144706 DOI: 10.1110/ps.9.9.1618] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The cerebroside-sulfate activator protein (CSAct or Saposin B) is a small water-soluble glycoprotein that plays an essential role in the metabolism of certain glycosphingolipids, especially sulfatide. Deficiency of CSAct in humans leads to sulfatide accumulation and neurodegenerative disease. CSAct activity can be measured in vitro by assay of its ability to activate sulfatide-sulfate hydrolysis by arylsulfatase A. CSAct has seven methionine residues and a mass of 8,845 Da when deglycosylated. Mildly oxidized, deglycosylated CSAct (+16 Da), separated from nonoxidized CSAct by reversed-phase high-performance liquid chromatography (RP-HPLC), showed significant modulation of the in vitro activity. Because oxidation partially protected against CNBr cleavage and could largely be reversed by treatment with dithiothreitol, it was concluded that the major modification was conversion of a single methionine to its sulfoxide. High-resolution RP-HPLC separated mildly oxidized CSAct into seven or more different components with shorter retention times than nonoxidized CSAct. Mass spectrometry showed these components to have identical mass (+16 Da). The shorter retention times are consistent with increased polarity accompanying oxidation of surface-exposed methionyl side chains, in general accordance with the existing molecular model. A mass-spectrometric CNBr mapping protocol allowed identification of five of the seven possible methionine-sulfoxide CSAct oxoforms. The most dramatic suppression of activity occurred upon oxidation of Met61 (26% of control) with other residues in the Q60MMMHMQ66 motif falling in the 30-50% activity range. Under conditions of oxidative stress, accumulation of minimally oxidized CSAct protein in vivo could perturb metabolism of sulfatide and other glycosphingolipids. This, in turn, could contribute to the onset and progression of neurodegenerative disease, especially in situations where the catabolism of these materials is marginal.
Collapse
Affiliation(s)
- J P Whitelegge
- Pasarow Mass Spectrometry Laboratory, University of California, Los Angeles 90095, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Friend DS, Gurish MF, Austen KF, Hunt J, Stevens RL. Senescent jejunal mast cells and eosinophils in the mouse preferentially translocate to the spleen and draining lymph node, respectively, during the recovery phase of helminth infection. J Immunol 2000; 165:344-52. [PMID: 10861071 DOI: 10.4049/jimmunol.165.1.344] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Because mice infected with Trichinella spiralis experience a pronounced, but transient, mastocytosis and eosinophilia in their intestine, this disease model was used to follow the fate of senescent T cell-dependent mast cells (MCs) and eosinophils. Very few MCs or eosinophils undergoing apoptosis were found in the jejunum during the resolution phase of the infection, even though apoptotic MCs were common in the large intestine. Although the mesenteric draining lymph nodes contained large numbers of apoptotic eosinophils, MCs were rarely found at this location. During the recovery phase, large numbers of MCs were present in the spleen, and many of these cells possessed segmented nuclei. These splenic MCs were not proliferating. Although MCs from the jejunum and spleen of noninfected mice failed to express mouse MC protease (mMCP) 9, essentially all of the MCs in the jejunal submucosa and spleen of T. spiralis-infected mice expressed this serine protease during the recovery phase. The MCs in the jejunum expressed mMCP-9 before any mMCP-9-containing cells could be detected in the spleen. The fact that mMCP-9-containing MCs were detected in splenic blood vessels as these cells began to disappear from the jejunum supports the view that many jejunal MCs translocate to the spleen during the recovery phase of the infection. During this translocation process, some senescent jejunal MCs undergo nuclear segmentation. These studies reveal for the first time different exit and disposal pathways for T cell-dependent eosinophils and MCs after their expansion in the jejunum during a helminth infection.
Collapse
Affiliation(s)
- D S Friend
- Departments of Pathology and Medicine, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | |
Collapse
|
48
|
Faull KF, Higginson J, Waring AJ, Johnson J, To T, Whitelegge JP, Stevens RL, Fluharty CB, Fluharty AL. Disulfide connectivity in cerebroside sulfate activator is not necessary for biological activity or alpha-helical content but is necessary for trypsin resistance and strong ligand binding. Arch Biochem Biophys 2000; 376:266-74. [PMID: 10775412 DOI: 10.1006/abbi.2000.1714] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Cerebroside sulfate activator (CSAct) protein is exceptionally resistant to heat denaturation and proteolytic digestion. Although water soluble the protein binds membrane-associated lipids. Its biological role is thought to be to transfer certain lipids between membranes and to facilitate their catabolism in the lysosomes. An example of the latter is the removal of the sulfate group from cerebroside sulfate by arylsulfatase A. The mechanism of lipid sequestration from membranes and presentation of the lipid-protein complex to catabolic enzymes is a crucial aspect of the function of this protein. The widespread occurrence of the protein class of which CSAct is one of the best known members underscores the significance of this protein. The preparation, purification and chemical and biological properties of a stable disulfide blocked derivative of CSAct is described. The pyridoethylated protein was susceptible to tryptic attack and devoid of a significant population of solvent-protected exchange resistant protons. It apparantly formed a CS complex. However, unlike the complex with the native protein, this was not sufficiently stable to remain intact during size exclusion chromatography. The disulfide-blocked protein had a similar CD spectrum as native protein, indicating similar alpha-helical content. Unexpectedly, the activities of disulfide-blocked protein in the arylsulfatse A catalyzed sulfate hydrolysis from cerebroside sulfate were substantial. Hitherto, it had been assumed that the disulfide connectivities were essential for the protein to maintain a correctly folded configuration to bind lipid ligands and potentiate their hydrolysis. Some revision of our thoughts on the importance of the disulfide connectivities in the structure and function of the protein are necessary.
Collapse
Affiliation(s)
- K F Faull
- Pasarow Mass Spectrometry Laboratory, UCLA, Los Angeles, California 90095, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Faull KF, Higginson J, Waring AJ, To T, Whitelegge JP, Stevens RL, Fluharty CB, Fluharty AL. Hydrogen-deuterium exchange signature of porcine cerebroside sulfate activator protein. J Mass Spectrom 2000; 35:392-401. [PMID: 10767769 DOI: 10.1002/(sici)1096-9888(200003)35:3<392::aid-jms948>3.0.co;2-t] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Hydrogen-deuterium exchange can be a sensitive indicator of protein structural integrity. Comparisons were made between cerebroside sulfate activator protein (CSAct) in the native state and after treatment with guanidine hydrochloride plus dithiothreitol. Native protein has three internal disulfide bonds and treated protein has no internal disulfide bonds. The comparisons were made using hydrogen-deuterium exchange measured by electrospray ionization mass spectrometry, percentage alpha-helical content measured by circular dichroism and biological activity measured by the ability to support arylsulfatase A-catalyzed sulfate hydrolysis from cerebroside sulfate. In acidic solvent native protein has 59 exchange refractory protons and treated protein has 20 exchange refractory protons (44 and 14% of the exchangeable proton populations, respectively). In native protein the size of the exchange refractory proton population is sensitive to changes in pH, temperature and the presence of a ligand. It is uninfluenced by the presence or absence of glycosyl groups attached to Asn21. Helical content is virtually identical in native and treated protein. Biological activity is significantly reduced but not obliterated in treated protein. The hydrogen-deuterium exchange profile appears to be a sensitive signature of the correctly folded protein, and reflects a dimension of the protein structure that is not apparent in circular dichroic spectra or in the ability of the protein to support arylsulfatase A-catalyzed sulfate hydrolysis from sulfatide. The hydrogen-deuterium exchange profile will be a valuable criterion for characterizing mutant forms of CSAct produced by recombinant and synthetic paradigms and also the native and mutant forms of related proteins.
Collapse
Affiliation(s)
- K F Faull
- Department of Psychiatry and Biobehavioral Sciences and the Neuropsychiatric Institute, UCLA, Los Angeles, California 90095, USA.
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Huang C, Morales G, Vagi A, Chanasyk K, Ferrazzi M, Burklow C, Qiu WT, Feyfant E, Sali A, Stevens RL. Formation of enzymatically active, homotypic, and heterotypic tetramers of mouse mast cell tryptases. Dependence on a conserved Trp-rich domain on the surface. J Biol Chem 2000; 275:351-8. [PMID: 10617625 DOI: 10.1074/jbc.275.1.351] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Mouse mast cell protease (mMCP) 6 and mMCP-7 are homologous tryptases stored in granules as macromolecular complexes with heparin and/or chondroitin sulfate E containing serglycin proteoglycans. When pro-mMCP-7 and pseudozymogen forms of this tryptase and mMCP-6 were separately expressed in insect cells, all three recombinant proteins were secreted into the conditioned medium as properly folded, enzymatically inactive 33-kDa monomers. However, when their propeptides were removed, mMCP-6 and mMCP-7 became enzymatically active and spontaneously assumed an approximately 150-kDa tetramer structure. Heparin was not required for this structural change. When incubated at 37 degrees C, recombinant mMCP-7 progressively lost its enzymatic activity in a time-dependent manner. Its N-linked glycans helped regulate the thermal stability of mMCP-7. However, the ability of this tryptase to form the enzymatically active tetramer was more dependent on a highly conserved Trp-rich domain on its surface. Although recombinant mMCP-6 and mMCP-7 preferred to form homotypic tetramers, these tryptases readily formed heterotypic tetramers in vitro. This latter finding indicates that the tetramer structural unit is a novel way the mast cell uses to assemble varied combinations of tryptases.
Collapse
Affiliation(s)
- C Huang
- Departments of Medicine, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|