Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Piccolo SR, Lee TJ, Suh E, Hill K. ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data. Gigascience 2020;9:giaa026. [PMID: 32249316 PMCID: PMC7131989 DOI: 10.1093/gigascience/giaa026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 12/05/2019] [Accepted: 02/28/2020] [Indexed: 11/27/2022] Open

For:	Piccolo SR, Lee TJ, Suh E, Hill K. ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data. Gigascience 2020;9:giaa026. [PMID: 32249316 PMCID: PMC7131989 DOI: 10.1093/gigascience/giaa026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 12/05/2019] [Accepted: 02/28/2020] [Indexed: 11/27/2022] Open

Number

Cited by Other Article(s)

Kaczmarzyk JR, Gupta R, Kurc TM, Abousamra S, Saltz JH, Koo PK. ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;239:107631. [PMID: 37271050 PMCID: PMC11093625 DOI: 10.1016/j.cmpb.2023.107631] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/23/2023] [Accepted: 05/28/2023] [Indexed: 06/06/2023]

Abstract

BACKGROUND AND OBJECTIVE

Histopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations. Our objective in this work is to present a software tool that addresses this need and enables robust, systematic evaluation of neural network models for patch classification in histology in a light-weight, easy-to-use package for both algorithm developers and biomedical researchers.

METHODS

Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evaluate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It enables training and evaluation of models supported by timm directly from the command line, without the need for users to write any code. External models are enabled through a straightforward API and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and deep learning architectures on pathology datasets, making it more accessible to the broader scientific community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of possible models that could be employed with ChampKit, focusing on several popular deep learning models, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each model trained either from random weight initialization or with transfer learning from ImageNet pretrained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model.

RESULTS

The main result of this paper is the ChampKit software. Using ChampKit, we were able to systemically evaluate multiple neural networks across six datasets. We observed mixed results when evaluating the benefits of pretraining versus random intialization, with no clear benefit except in the low data regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning from self-supervised weights rarely improved performance, which is counter to other areas of computer vision.

CONCLUSIONS

Choosing the right model for a given digital pathology dataset is nontrivial. ChampKit provides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined) deep learning models across a variety of pathology tasks. Source code and data for the tool are freely accessible at https://github.com/SBU-BMI/champkit.

Collapse

Sharma NK, Ayyala R, Deshpande D, Patel YM, Munteanu V, Ciorba D, Fiscutean A, Vahed M, Sarkar A, Guo R, Moore A, Darci-Maher N, Nogoy NA, Abedalthagafi MS, Mangul S. Analytical code sharing practices in biomedical research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551384. [PMID: 37609176 PMCID: PMC10441317 DOI: 10.1101/2023.07.31.551384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Affiliation(s)

Nitesh Kumar Sharma Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
Ram Ayyala Quantitative and Computational Biology Department, USC Dana and David Dornsife College of Letters, Arts, and Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA
Dhrithi Deshpande Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
Yesha M Patel Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
Viorel Munteanu Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, 2045, Moldova
Dumitru Ciorba Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, 2045, Moldova
Andrada Fiscutean Faculty of Journalism and Communication Studies, University of Bucharest, Soseaua Panduri, nr. 90, Sector 5, 050663, Bucharest, Romania
Mohammad Vahed Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
Aditya Sarkar School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, North Campus, Kamand, Mandi, Himachal Pradesh, 175005, India
Ruiwei Guo Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Room 713. Los Angeles, CA 90089-9121, USA
Andrew Moore Daniel J. Epstein Department of Industrial and Systems Engineering, Viterbi School of Engineering, University of Southern California
Nicholas Darci-Maher Computational and Systems Biology, University of California, Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
Nicole A Nogoy GigaScience Press, L26/F, Kings Wing Plaza 2, 1 On Kwan Street, Shek Mun, N.T., Hong Kong
Malak S. Abedalthagafi Department of Pathology & Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA King Salman Center for Disability Research, Riyadh, Saudi Arabia
Serghei Mangul Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA Department of Quantitative and Computational Biology, University of Southern California Dornsife College of Letters, Arts, and Sciences, Los Angeles, CA 90089, USA

Collapse

Multispectral Image under Tissue Classification Algorithm in Screening of Cervical Cancer. JOURNAL OF HEALTHCARE ENGINEERING 2022;2022:9048123. [PMID: 35035863 PMCID: PMC8759862 DOI: 10.1155/2022/9048123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 12/06/2021] [Accepted: 12/10/2021] [Indexed: 11/30/2022]

Abstract

The objectives of this study were to improve the efficiency and accuracy of early clinical diagnosis of cervical cancer and to explore the application of tissue classification algorithm combined with multispectral imaging in screening of cervical cancer. 50 patients with suspected cervical cancer were selected. Firstly, the multispectral imaging technology was used to collect the multispectral images of the cervical tissues of 50 patients under the conventional white light waveband, the narrowband green light waveband, and the narrowband blue light waveband. Secondly, the collected multispectral images were fused, and then the tissue classification algorithm was used to segment the diseased area according to the difference between the cervical tissues without lesions and the cervical tissues with lesions. The difference in the contrast and other characteristics of the multiband spectrum fusion image would segment the diseased area, which was compared with the results of the disease examination. The average gradient, standard deviation (SD), and image entropy were adopted to evaluate the image quality, and the sensitivity and specificity were selected to evaluate the clinical application value of discussed method. The fused spectral image was compared with the image without lesions, it was found that there was a clear difference, and the fused multispectral image showed a contrast of 0.7549, which was also higher than that before fusion (0.4716), showing statistical difference (P < 0.05). The average gradient, SD, and image entropy of the multispectral image assisted by the tissue classification algorithm were 2.0765, 65.2579, and 4.974, respectively, showing statistical difference (P < 0.05). Compared with the three reported indicators, the values of the algorithm in this study were higher. The sensitivity and specificity of the multispectral image with the tissue classification algorithm were 85.3% and 70.8%, respectively, which were both greater than those of the image without the algorithm. It showed that the multispectral image assisted by tissue classification algorithm can effectively screen the cervical cancer and can quickly, efficiently, and safely segment the cervical tissue from the lesion area and the nonlesion area. The segmentation result was the same as that of the doctor's disease examination, indicating that it showed high clinical application value. This provided an effective reference for the clinical application of multispectral imaging technology assisted by tissue classification algorithm in the early screening and diagnosis of cervical cancer.

Collapse

The ability to classify patients based on gene-expression data varies by algorithm and performance metric. PLoS Comput Biol 2022;18:e1009926. [PMID: 35275931 PMCID: PMC8942277 DOI: 10.1371/journal.pcbi.1009926] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 03/23/2022] [Accepted: 02/15/2022] [Indexed: 01/02/2023] Open

Abstract

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.

Collapse

Nüst D, Eglen SJ. CODECHECK: an Open Science initiative for the independent execution of computations underlying research articles during peer review to improve reproducibility. F1000Res 2021;10:253. [PMID: 34367614 PMCID: PMC8311796 DOI: 10.12688/f1000research.51738.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/15/2021] [Indexed: 11/20/2022] Open

Nüst D, Eglen SJ. CODECHECK: an Open Science initiative for the independent execution of computations underlying research articles during peer review to improve reproducibility. F1000Res 2021;10:253. [PMID: 34367614 PMCID: PMC8311796 DOI: 10.12688/f1000research.51738.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/22/2021] [Indexed: 11/08/2023] Open

Edmunds SC, Goodman L. GigaByte: Publishing at the Speed of Research. GIGABYTE 2020;2020:gigabyte1. [PMID: 36824595 PMCID: PMC9631982 DOI: 10.46471/gigabyte.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 06/01/2020] [Indexed: 11/09/2022] Open

Piccolo SR, Lee TJ, Suh E, Hill K. ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data. Gigascience 2020;9:giaa026. [PMID: 32249316 PMCID: PMC7131989 DOI: 10.1093/gigascience/giaa026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 12/05/2019] [Accepted: 02/28/2020] [Indexed: 11/27/2022] Open

Abstract

BACKGROUND

Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation.

FINDINGS

To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner.

CONCLUSIONS

This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach.

Collapse