1
|
Hagan AK, Lesniak NA, Balunas MJ, Bishop L, Close WL, Doherty MD, Elmore AG, Flynn KJ, Hannigan GD, Koumpouras CC, Jenior ML, Kozik AJ, McBride K, Rifkin SB, Stough JMA, Sovacool KL, Sze MA, Tomkovich S, Topcuoglu BD, Schloss PD. Ten simple rules to increase computational skills among biologists with Code Clubs. PLoS Comput Biol 2020; 16:e1008119. [PMID: 32853198 PMCID: PMC7451508 DOI: 10.1371/journal.pcbi.1008119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Ada K. Hagan
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nicholas A. Lesniak
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Marcy J. Balunas
- Division of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Lucas Bishop
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - William L. Close
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew D. Doherty
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Amanda G. Elmore
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kaitlin J. Flynn
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Geoffrey D. Hannigan
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Charlie C. Koumpouras
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew L. Jenior
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ariangela J. Kozik
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kathryn McBride
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Samara B. Rifkin
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Joshua M. A. Stough
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kelly L. Sovacool
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Marc A. Sze
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sarah Tomkovich
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Begum D. Topcuoglu
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Patrick D. Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
2
|
Topçuoğlu BD, Lesniak NA, Ruffin MT, Wiens J, Schloss PD. A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems. mBio 2020; 11:e00434-20. [PMID: 32518182 PMCID: PMC7373189 DOI: 10.1128/mbio.00434-20] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 05/06/2020] [Indexed: 12/12/2022] Open
Abstract
Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability.IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.
Collapse
Affiliation(s)
- Begüm D Topçuoğlu
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Nicholas A Lesniak
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Mack T Ruffin
- Department of Family Medicine and Community Medicine, Penn State Hershey Medical Center, Hershey, Pennsylvania, USA
| | - Jenna Wiens
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Patrick D Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
3
|
Sze MA, Topçuoğlu BD, Lesniak NA, Ruffin MT, Schloss PD. Fecal Short-Chain Fatty Acids Are Not Predictive of Colonic Tumor Status and Cannot Be Predicted Based on Bacterial Community Structure. mBio 2019; 10:e01454-19. [PMID: 31266879 PMCID: PMC6606814 DOI: 10.1128/mbio.01454-19] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 06/07/2019] [Indexed: 01/11/2023] Open
Abstract
Colonic bacterial populations are thought to have a role in the development of colorectal cancer with some protecting against inflammation and others exacerbating inflammation. Short-chain fatty acids (SCFAs) have been shown to have anti-inflammatory properties and are produced in large quantities by colonic bacteria that produce SCFAs by fermenting fiber. We assessed whether there was an association between fecal SCFA concentrations and the presence of colonic adenomas or carcinomas in a cohort of individuals using 16S rRNA gene and metagenomic shotgun sequence data. We measured the fecal concentrations of acetate, propionate, and butyrate within the cohort and found that there were no significant associations between SCFA concentration and tumor status. When we incorporated these concentrations into random forest classification models trained to differentiate between people with healthy colons and those with adenomas or carcinomas, we found that they did not significantly improve the ability of 16S rRNA gene or metagenomic gene sequence-based models to classify individuals. Finally, we generated random forest regression models trained to predict the concentration of each SCFA based on 16S rRNA gene or metagenomic gene sequence data from the same samples. These models performed poorly and were able to explain at most 14% of the observed variation in the SCFA concentrations. These results support the broader epidemiological data that questions the value of fiber consumption for reducing the risks of colorectal cancer. Although other bacterial metabolites may serve as biomarkers to detect adenomas or carcinomas, fecal SCFA concentrations have limited predictive power.IMPORTANCE Considering that colorectal cancer is the third leading cancer-related cause of death within the United States, it is important to detect colorectal tumors early and to prevent the formation of tumors. Short-chain fatty acids (SCFAs) are often used as a surrogate for measuring gut health and for being anticarcinogenic because of their anti-inflammatory properties. We evaluated the fecal SCFA concentrations of a cohort of individuals with different colonic tumor burdens who were previously analyzed to identify microbiome-based biomarkers of tumors. We were unable to find an association between SCFA concentration and tumor burden or use SCFAs to improve our microbiome-based models of classifying people based on their tumor status. Furthermore, we were unable to find an association between the fecal community structure and SCFA concentrations. Our results indicate that the association between fecal SCFAs, the gut microbiome, and tumor burden is weak.
Collapse
Affiliation(s)
- Marc A Sze
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Begüm D Topçuoğlu
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Nicholas A Lesniak
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Mack T Ruffin
- Department of Family Medicine and Community Medicine, Penn State Hershey Medical Center, Hershey, Pennsylvania, USA
| | - Patrick D Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
4
|
Li Z, Shanmuganathan A, Ruetz M, Yamada K, Lesniak NA, Kräutler B, Brunold TC, Koutmos M, Banerjee R. Coordination chemistry controls the thiol oxidase activity of the B 12-trafficking protein CblC. J Biol Chem 2017; 292:9733-9744. [PMID: 28442570 DOI: 10.1074/jbc.m117.788554] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 04/20/2017] [Indexed: 01/20/2023] Open
Abstract
The cobalamin or B12 cofactor supports sulfur and one-carbon metabolism and the catabolism of odd-chain fatty acids, branched-chain amino acids, and cholesterol. CblC is a B12-processing enzyme involved in an early cytoplasmic step in the cofactor-trafficking pathway. It catalyzes the glutathione (GSH)-dependent dealkylation of alkylcobalamins and the reductive decyanation of cyanocobalamin. CblC from Caenorhabditis elegans (ceCblC) also exhibits a robust thiol oxidase activity, converting reduced GSH to oxidized GSSG with concomitant scrubbing of ambient dissolved O2 The mechanism of thiol oxidation catalyzed by ceCblC is not known. In this study, we demonstrate that novel coordination chemistry accessible to ceCblC-bound cobalamin supports its thiol oxidase activity via a glutathionyl-cobalamin intermediate. Deglutathionylation of glutathionyl-cobalamin by a second molecule of GSH yields GSSG. The crystal structure of ceCblC provides insights into how architectural differences at the α- and β-faces of cobalamin promote the thiol oxidase activity of ceCblC but mute it in wild-type human CblC. The R161G and R161Q mutations in human CblC unmask its latent thiol oxidase activity and are correlated with increased cellular oxidative stress disease. In summary, we have uncovered key architectural features in the cobalamin-binding pocket that support unusual cob(II)alamin coordination chemistry and enable the thiol oxidase activity of ceCblC.
Collapse
Affiliation(s)
- Zhu Li
- From the Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor, Michigan 48109-0600
| | - Aranganathan Shanmuganathan
- the Department of Biochemistry, Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814
| | - Markus Ruetz
- From the Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor, Michigan 48109-0600
| | - Kazuhiro Yamada
- the Department of Biochemistry, Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814
| | - Nicholas A Lesniak
- From the Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor, Michigan 48109-0600
| | - Bernhard Kräutler
- the Institute of Organic Chemistry and Centre of Molecular Biosciences, University of Innsbruck, Innrain 80/82, 6020 Innsbruck, Austria, and
| | - Thomas C Brunold
- the Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706
| | - Markos Koutmos
- the Department of Biochemistry, Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814
| | - Ruma Banerjee
- From the Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor, Michigan 48109-0600,
| |
Collapse
|
5
|
Li Z, Lesniak NA, Banerjee R. Unusual aerobic stabilization of Cob(I)alamin by a B12-trafficking protein allows chemoenzymatic synthesis of organocobalamins. J Am Chem Soc 2014; 136:16108-11. [PMID: 25369151 DOI: 10.1021/ja5077316] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
CblC, a B12 trafficking protein, exhibits glutathione transferase and reductive decyanase activities for processing alkylcobalamins and cyanocobalamin, respectively, to a common intermediate that is subsequently converted to the biologically active forms of the cofactor. We recently discovered that the Caenorhabditis elegans CblC catalyzes thiol-dependent decyanation of CNCbl and reduction of OH2Cbl and stabilizes the paramagnetic cob(II)alamin product under aerobic conditions. In this study, we report the striking ability of the worm CblC to stabilize the highly reactive cob(I)alamin product of the glutathione transferase reaction. The unprecedented stabilization of the supernucleophilic cob(I)alamin species under aerobic conditions by the intrinsic thiol oxidase activity of CblC, was exploited for the chemoenzymatic synthesis of organocobalamin derivatives under mild conditions.
Collapse
Affiliation(s)
- Zhu Li
- Department of Biological Chemistry, University of Michigan Medical School , Ann Arbor, Michigan 48109-0600, United States
| | | | | |
Collapse
|
6
|
Li Z, Gherasim C, Lesniak NA, Banerjee R. Glutathione-dependent one-electron transfer reactions catalyzed by a B₁₂ trafficking protein. J Biol Chem 2014; 289:16487-97. [PMID: 24742678 DOI: 10.1074/jbc.m114.567339] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
CblC is involved in an early step in cytoplasmic cobalamin processing following entry of the cofactor into the cytoplasm. CblC converts the cobalamin cargo arriving from the lysosome to a common cob(II)alamin intermediate, which can be subsequently converted to the biologically active forms. Human CblC exhibits glutathione (GSH)-dependent alkyltransferase activity and flavin-dependent reductive decyanation activity with cyanocobalamin (CNCbl). In this study, we discovered two new GSH-dependent activities associated with the Caenorhabditis elegans CblC for generating cob(II)alamin: decyanation of CNCbl and reduction of aquocobalamin (OH2Cbl). We subsequently found that human CblC also catalyzes GSH-dependent decyanation of CNCbl and reduction of OH2Cbl, albeit efficiently only under anaerobic conditions. The air sensitivity of the human enzyme suggests interception by oxygen during the single-electron transfer step from GSH to CNCbl. These newly discovered GSH-dependent single-electron transfer reactions expand the repertoire of catalytic activities supported by CblC, a versatile B12-processing enzyme.
Collapse
Affiliation(s)
- Zhu Li
- From the Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan 48109-0600
| | - Carmen Gherasim
- From the Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan 48109-0600
| | - Nicholas A Lesniak
- From the Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan 48109-0600
| | - Ruma Banerjee
- From the Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan 48109-0600
| |
Collapse
|