1
|
Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024; 7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.
Collapse
Affiliation(s)
- Konrad Herbst
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Division of Systems Engineering, Boston University, Boston, MA, USA
| | - Elena J Forchielli
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Meghan Thommes
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
| |
Collapse
|
3
|
Dubs NM, Davis BR, de Brito V, Colebrook KC, Tiefel IJ, Nakayama MB, Huang R, Ledvina AE, Hack SJ, Inkelaar B, Martins TR, Aartila SM, Albritton KS, Almuhanna S, Arnoldi RJ, Austin CK, Battle AC, Begeman GR, Bickings CM, Bradfield JT, Branch EC, Conti EP, Cooley B, Dotson NM, Evans CJ, Fries AS, Gilbert IG, Hillier WD, Huang P, Hyde KW, Jevtovic F, Johnson MC, Keeler JL, Lam A, Leach KM, Livsey JD, Lo JT, Loney KR, Martin NW, Mazahem AS, Mokris AN, Nichols DM, Ojha R, Okorafor NN, Paris JR, Reboucas TF, Sant'Anna PB, Seitz MR, Seymour NR, Slaski LK, Stemaly SO, Ulrich BR, Van Meter EN, Young ML, Barkman TJ. A collaborative classroom investigation of the evolution of SABATH methyltransferase substrate preference shifts over 120 million years of flowering plant history. Mol Biol Evol 2022; 39:6503504. [PMID: 35021222 PMCID: PMC8890502 DOI: 10.1093/molbev/msac007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Next-generation sequencing has resulted in an explosion of available data, much of which remains unstudied in terms of biochemical function; yet, experimental characterization of these sequences has the potential to provide unprecedented insight into the evolution of enzyme activity. One way to make inroads into the experimental study of the voluminous data available is to engage students by integrating teaching and research in a college classroom such that eventually hundreds or thousands of enzymes may be characterized. In this study, we capitalize on this potential to focus on SABATH methyltransferase enzymes that have been shown to methylate the important plant hormone, salicylic acid (SA), to form methyl salicylate. We analyze data from 76 enzymes of flowering plant species in 23 orders and 41 families to investigate how widely conserved substrate preference is for SA methyltransferase orthologs. We find a high degree of conservation of substrate preference for SA over the structurally similar metabolite, benzoic acid, with recent switches that appear to be associated with gene duplication and at least three cases of functional compensation by paralogous enzymes. The presence of Met in active site position 150 is a useful predictor of SA methylation preference in SABATH methyltransferases but enzymes with other residues in the homologous position show the same substrate preference. Although our dense and systematic sampling of SABATH enzymes across angiosperms has revealed novel insights, this is merely the “tip of the iceberg” since thousands of sequences remain uncharacterized in this enzyme family alone.
Collapse
Affiliation(s)
- Nicole M Dubs
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Breck R Davis
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Victor de Brito
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Kate C Colebrook
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Ian J Tiefel
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Madison B Nakayama
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Ruiqi Huang
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Audrey E Ledvina
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Samantha J Hack
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Brent Inkelaar
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Talline R Martins
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Sarah M Aartila
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Kelli S Albritton
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Sarah Almuhanna
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Ryan J Arnoldi
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Clara K Austin
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Amber C Battle
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Gregory R Begeman
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Caitlin M Bickings
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Jonathon T Bradfield
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Eric C Branch
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Eric P Conti
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Breana Cooley
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Nicole M Dotson
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Cheyone J Evans
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Amber S Fries
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Ivan G Gilbert
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Weston D Hillier
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Pornkamol Huang
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Kaitlin W Hyde
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Filip Jevtovic
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Mark C Johnson
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Julie L Keeler
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Albert Lam
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Kyle M Leach
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Jeremy D Livsey
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Jonathan T Lo
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Kevin R Loney
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Nich W Martin
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Amber S Mazahem
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Aurora N Mokris
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Destiny M Nichols
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Ruchi Ojha
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Nnanna N Okorafor
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Joshua R Paris
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | | | | | - Mathew R Seitz
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Nathan R Seymour
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Lila K Slaski
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Stephen O Stemaly
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Benjamin R Ulrich
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Emile N Van Meter
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Meghan L Young
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| | - Todd J Barkman
- Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008
| |
Collapse
|
4
|
Dimonaco NJ, Aubrey W, Kenobi K, Clare A, Creevey CJ. No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study. Bioinformatics 2021; 38:1198-1207. [PMID: 34875010 PMCID: PMC8825762 DOI: 10.1093/bioinformatics/btab827] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 11/13/2021] [Accepted: 12/02/2021] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, UK,To whom correspondence should be addressed.
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | | |
Collapse
|
5
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|