1
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
2
|
Pires DEV, Veloso WNP, Myung Y, Rodrigues CHM, Silk M, Rezende PM, Silva F, Xavier JS, Velloso JPL, da Silveira CH, Ascher DB. EasyVS: a user-friendly web-based tool for molecule library selection and structure-based virtual screening. Bioinformatics 2021; 36:4200-4202. [PMID: 32399551 DOI: 10.1093/bioinformatics/btaa480] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 04/01/2020] [Accepted: 05/05/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY EasyVS is a web-based platform built to simplify molecule library selection and virtual screening. With an intuitive interface, the tool allows users to go from selecting a protein target with a known structure and tailoring a purchasable molecule library to performing and visualizing docking in a few clicks. Our system also allows users to filter screening libraries based on molecule properties, cluster molecules by similarity and personalize docking parameters. AVAILABILITY AND IMPLEMENTATION EasyVS is freely available as an easy-to-use web interface at http://biosig.unimelb.edu.au/easyvs. CONTACT douglas.pires@unimelb.edu.au or david.ascher@unimelb.edu.au. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne 3010, Australia
| | - Wandré N P Veloso
- Institute of Technological Sciences, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - YooChan Myung
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne 3010, Australia
| | - Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne 3010, Australia
| | - Michael Silk
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne 3010, Australia
| | - Pâmela M Rezende
- Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Francislon Silva
- Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Joicymara S Xavier
- Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil.,Instituto de Ciências Agrárias, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Unaí 38610-000, Brazil
| | - João P L Velloso
- Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Carlos H da Silveira
- Institute of Technological Sciences, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Melbourne 3010, Australia.,Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
3
|
Xavier JS, Nguyen TB, Karmarkar M, Portelli S, Rezende PM, Velloso JPL, Ascher DB, Pires DEV. ThermoMutDB: a thermodynamic database for missense mutations. Nucleic Acids Res 2021; 49:D475-D479. [PMID: 33095862 PMCID: PMC7778973 DOI: 10.1093/nar/gkaa925] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 09/21/2020] [Accepted: 10/12/2020] [Indexed: 01/17/2023] Open
Abstract
Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins. This represents an increase of 83% in unique mutations over previous databases and includes thermodynamic information on 204 new proteins. During manual curation we have also corrected annotation errors in previously curated entries. Associated with each entry, we have included information on the unfolding Gibbs free energy and melting temperature change, and have associated entries with available experimental structural information. ThermoMutDB supports users to contribute to new data points and programmatic access to the database via a RESTful API. ThermoMutDB is freely available at: http://biosig.unimelb.edu.au/thermomutdb.
Collapse
Affiliation(s)
- Joicymara S Xavier
- Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri.,Instituto René Rachou, Fundação Oswaldo Cruz
| | | | - Malancha Karmarkar
- Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | - Stephanie Portelli
- Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | | | - David B Ascher
- Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,Department of Biochemistry, University of Cambridge
| | - Douglas E V Pires
- Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
4
|
Abstract
Soybean is one of the main sources of foreign exchange credits for Brazil in the agricultural sector. There is increasing interest in growing this leguminous crop, especially in the southern region of Minas Gerais, due to its importance as an alternative for crop rotation with maize. In this respect, the study of the adaptability of new cultivars to the region is indispensable so as to obtain high yields. Thus, the aim of this study was to evaluate the performance of 38 soybean cultivars for growing in the summer season in the municipality of Lavras, MG, Brazil, in the 2010/2011 and 2011/2012 crop years. The experiments were conducted in a randomized block design with 3 replications and the treatments consisted of 38 cultivars. At the time of harvest, the following assessments were made: grain yield (kg/ha), height of the lowest pod (cm), plant height (cm), and lodging. The data were subjected to individual and combined analysis of variance. The phenotypic mean values were clustered, adopting the Scott and Knott test. For simultaneous selection of multiple traits, the sum of rank index of Mulamba and Mock was adopted. The cultivar TMG 801 RR had the best yield performance; the cultivars Monsoy 8001, MGBR-46 (Conquista), and BRSMG 68 (Vencedora) also stood out. Considering simultaneous selection for grain yield, plant height, height of the lowest pod, and lodging, the cultivar TMG 801 RR is recommended for growing in the summer season in the southern region of Minas Gerais.
Collapse
Affiliation(s)
- I O Soares
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - P M Rezende
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - A T Bruzi
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - E V Zambiazzi
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - A M Zuffo
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - K B Silva
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - R Gwinner
- Departamento de Agricultura, Universidade Federal de Lavras, Lavras, MG, Brasil
| |
Collapse
|
5
|
Arantes UM, Stringhini JH, Oliveira MC, Martins PC, Rezende PM, Andrade MA, Leandro NSM, Café MB. Effect of different electrolyte balances in broiler diets. Rev Bras Cienc Avic 2013. [DOI: 10.1590/s1516-635x2013000300010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|