1
|
Liang J, Wang C, Zhang D, Xie Y, Zeng Y, Li T, Zuo Z, Ren J, Zhao Q. VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research. J Genet Genomics 2023; 50:151-162. [PMID: 36608930 DOI: 10.1016/j.jgg.2022.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 12/26/2022] [Indexed: 01/04/2023]
Abstract
Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research. With its advantages in both feature shrinkage and biological interpretability, Least Absolute Shrinkage and Selection Operator (LASSO) algorithm is one of the most popular methods for the scenarios of clinical biomarker development. However, in practice, applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables, leading to the overfitting of the model. Here, we present VSOLassoBag, a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data. Using a bagging strategy in combination with a parametric method or inflection point search method, VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates. The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction. In addition, by comparing with multiple existing algorithms, VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others. In summary, VSOLassoBag, which is available at https://seqworld.com/VSOLassoBag/ under the GPL v3 license, provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data. For user's convenience, we implement VSOLassoBag as an R package that provides multithreading computing configurations.
Collapse
Affiliation(s)
- Jiaqi Liang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China; State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Chaoye Wang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Di Zhang
- Department of Coloproctology Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, Guangdong Institute of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510655, China
| | - Yubin Xie
- Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510060, China
| | - Yanru Zeng
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Tianqin Li
- Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Jian Ren
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Qi Zhao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| |
Collapse
|
2
|
Feature selection using Information Gain and decision information in neighborhood decision system. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
3
|
Feature selection using relative dependency complement mutual information in fitting fuzzy rough set model. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04445-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
4
|
Qu K, Xu J, Han Z, Xu S. Maximum relevance minimum redundancy-based feature selection using rough mutual information in adaptive neighborhood rough sets. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04398-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
5
|
Feature selection based on double-hierarchical and multiplication-optimal fusion measurement in fuzzy neighborhood rough sets. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
7
|
Feature selection using self-information uncertainty measures in neighborhood information systems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03760-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Xu J, Qu K, Meng X, Sun Y, Hou Q. Feature selection based on multiview entropy measures in multiperspective rough set. INT J INTELL SYST 2022. [DOI: 10.1002/int.22878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Jiucheng Xu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Kanglin Qu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Xiangru Meng
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Yuanhao Sun
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Qincheng Hou
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| |
Collapse
|