1
|
Garbulowski M, Diamanti K, Smolińska K, Baltzer N, Stoll P, Bornelöv S, Øhrn A, Feuk L, Komorowski J. R.ROSETTA: an interpretable machine learning framework. BMC Bioinformatics 2021; 22:110. [PMID: 33676405 PMCID: PMC7937228 DOI: 10.1186/s12859-021-04049-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/24/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
Collapse
Affiliation(s)
- Mateusz Garbulowski
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Klev Diamanti
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Karolina Smolińska
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Nicholas Baltzer
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Department of Research, Cancer Registry of Norway, Oslo, Norway
| | - Patricia Stoll
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
| | - Susanne Bornelöv
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jan Komorowski
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
- Swedish Collegium for Advanced Study, Uppsala, Sweden.
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.
- Washington National Primate Research Center, Seattle, WA, USA.
| |
Collapse
|
2
|
Gil-Herrera E, Yalcin A, Tsalatsanis A, Barnes LE, Djulbegovic B. Towards a classification model to identify hospice candidates in terminally ill patients. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2012:1278-81. [PMID: 23366132 DOI: 10.1109/embc.2012.6346171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper presents a Rough Set Theory (RST) based classification model to identify hospice candidates within a group of terminally ill patients. Hospice care considerations are particularly valuable for terminally ill patients since they enable patients and their families to initiate end-of-life discussions and choose the most desired management strategy for the remainder of their lives. Unlike traditional data mining methodologies, our approach seeks to identify subgroups of patients possessing common characteristics that distinguish them from other subgroups in the dataset. Thus, heterogeneity in the data set is captured before the classification model is built. Object related reducts are used to obtain the minimum set of attributes that describe each subgroup existing in the dataset. As a result, a collection of decision rules is derived for classifying new patients based on the subgroup to which they belong. Results show improvements in the classification accuracy compared to a traditional RST methodology, in which patient diversity is not considered. We envision our work as a part of a comprehensive decision support system designed to facilitate end-of-life care decisions. Retrospective data from 9105 patients is used to demonstrate the design and implementation details of the classification model.
Collapse
Affiliation(s)
- Eleazar Gil-Herrera
- Department of Industrial and Management System Engineering, University of South Florida, Tampa, FL 33620, USA.
| | | | | | | | | |
Collapse
|