1
|
Risk Stratification for Breast Cancer Patient by Simultaneous Learning of Molecular Subtype and Survival Outcome Using Genetic Algorithm-Based Gene Set Selection. Cancers (Basel) 2022; 14:cancers14174120. [PMID: 36077657 PMCID: PMC9454699 DOI: 10.3390/cancers14174120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/18/2022] [Accepted: 08/20/2022] [Indexed: 11/26/2022] Open
Abstract
Simple Summary Patient stratification is clinically important because it allows us to understand the characteristics and establish treatment strategies for a group. Transcriptomic data play an important role in determining molecular subtypes and predicting survival. In the case of breast cancer, although the order of prognosis according to molecular subtypes is well known, there is heterogeneity even within a subtype. Therefore, patient stratification considering both molecular subtypes and survival outcomes is required. In this study, a methodology to handle this problem is presented. A genetic algorithm is used to select a set of genes, and a risk score is assigned to each patient using their expression level. According to the risk score, patients are ordered and stratified considering molecular subtypes and survival outcomes. Consequently, informative genes for patient stratification with respect to both aspects could be nominated, and the usefulness of the risk score was shown through comparison with other indicators. Abstract Patient stratification is a clinically important task because it allows us to establish and develop efficient treatment strategies for particular groups of patients. Molecular subtypes have been successfully defined using transcriptomic profiles, and they are used effectively in clinical practice, e.g., PAM50 subtypes of breast cancer. Survival prediction contributed to understanding diseases and also identifying genes related to prognosis. It is desirable to stratify patients considering these two aspects simultaneously. However, there are no methods for patient stratification that consider molecular subtypes and survival outcomes at once. Here, we propose a methodology to deal with the problem. A genetic algorithm is used to select a gene set from transcriptome data, and their expression quantities are utilized to assign a risk score to each patient. The patients are ordered and stratified according to the score. A gene set was selected by our method on a breast cancer cohort (TCGA-BRCA), and we examined its clinical utility using an independent cohort (SCAN-B). In this experiment, our method was successful in stratifying patients with respect to both molecular subtype and survival outcome. We demonstrated that the orders of patients were consistent across repeated experiments, and prognostic genes were successfully nominated. Additionally, it was observed that the risk score can be used to evaluate the molecular aggressiveness of individual patients.
Collapse
|
2
|
A novel method for financial distress prediction based on sparse neural networks with $$L_{1/2}$$ regularization. INT J MACH LEARN CYB 2022; 13:2089-2103. [PMID: 35492262 PMCID: PMC9044388 DOI: 10.1007/s13042-022-01566-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 04/06/2022] [Indexed: 11/25/2022]
Abstract
Corporate financial distress is related to the interests of the enterprise and stakeholders. Therefore, its accurate prediction is of great significance to avoid huge losses from them. Despite significant effort and progress in this field, the existing prediction methods are either limited by the number of input variables or restricted to those financial predictors. To alleviate those issues, both financial variables and non-financial variables are screened out from the existing accounting and finance theory to use as financial distress predictors. In addition, a novel method for financial distress prediction (FDP) based on sparse neural networks is proposed, namely FDP-SNN, in which the weight of the hidden layer is constrained with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$L_{1/2}$$\end{document}L1/2 regularization to achieve the sparsity, so as to select relevant and important predictors, improving the predicted accuracy. It also provides support for the interpretability of the model. The results show that non-financial variables, such as investor protection and governance structure, play a key role in financial distress prediction than those financial ones, especially when the forecast period grows longer. By comparing those classic models proposed by predominant researchers in accounting and finance, the proposed model outperforms in terms of accuracy, precision, and AUC performance.
Collapse
|
3
|
Battery Sizing Optimization in Power Smoothing Applications. ENERGIES 2022. [DOI: 10.3390/en15030729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The main objective of this work was to determine the worth of installing an electrical battery in order to reduce peak power consumption. The importance of this question resides in the expensive terms of energy bills when using the maximum power level. If maximum power consumption decreases, it affects not only the revenues of maximum power level bills, but also results in important reductions at the source of the power. This way, the power of the transformer decreases, and other electrical elements can be removed from electrical installations. The authors studied the Spanish electrical system, and a particle swarm optimization (PSO) algorithm was used to model battery sizing in peak power smoothing applications for an electrical consumption point. This study proves that, despite not being entirely profitable at present due to current kWh prices, implanting a battery will definitely be an option to consider in the future when these prices come down.
Collapse
|
4
|
Abstract
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.
Collapse
|
5
|
Wang L, Li J, Liu J, Chang M. RAMRSGL: A Robust Adaptive Multinomial Regression Model for Multicancer Classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5584684. [PMID: 34122617 PMCID: PMC8172296 DOI: 10.1155/2021/5584684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/12/2021] [Indexed: 11/17/2022]
Abstract
In view of the challenges of the group Lasso penalty methods for multicancer microarray data analysis, e.g., dividing genes into groups in advance and biological interpretability, we propose a robust adaptive multinomial regression with sparse group Lasso penalty (RAMRSGL) model. By adopting the overlapping clustering strategy, affinity propagation clustering is employed to obtain each cancer gene subtype, which explores the group structure of each cancer subtype and merges the groups of all subtypes. In addition, the data-driven weights based on noise are added to the sparse group Lasso penalty, combining with the multinomial log-likelihood function to perform multiclassification and adaptive group gene selection simultaneously. The experimental results on acute leukemia data verify the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Lei Wang
- Department of Basic Science Teaching, Henan Polytechnic Institute, Nanyang, 473000 Henan, China
| | - Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007 Henan, China
| | - Juanfang Liu
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007 Henan, China
| | - Mingming Chang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007 Henan, China
| |
Collapse
|
6
|
Li L, Liu ZP. Biomarker discovery for predicting spontaneous preterm birth from gene expression data by regularized logistic regression. Comput Struct Biotechnol J 2020; 18:3434-3446. [PMID: 33294138 PMCID: PMC7689379 DOI: 10.1016/j.csbj.2020.10.028] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 10/24/2020] [Accepted: 10/25/2020] [Indexed: 01/23/2023] Open
Abstract
In this work, we provide a computational method of regularized logistic regression for discovering biomarkers of spontaneous preterm birth (SPTB) from gene expression data. The successful identification of SPTB biomarkers will greatly benefit the interference of infant gestational age for reducing the risks of pregnant women and preemies. In recent years, various approaches have been proposed for the feature selection of identifying the subset of meaningful genes that can achieve accurate classification for disease samples from controls. Here, we comprehensively summarize the regularized logistic regression with seven effective penalties developed for the selection of strongly indicative genes of SPTB from microarray data. We compare their properties and assess their classification performances in multiple datasets. It shows that elastic net, lasso,L 1 / 2 and SCAD penalties get the better performance than others and can be successfully used to identify biomarkers of SPTB. Particularly, we make a functional enrichment analysis on these biomarkers and construct a logistic regression classifier based on them. The classifier generates an indicator of preterm risk score (PRS) for predicting SPTB. Based on the trained predictor, we verify the identified biomarkers on an independent dataset. The biomarkers achieve the AUC value of 0.933 in the SPTB classification. The results demonstrate the effectiveness and efficiency of the built-up strategy of biomarker discovery with regularized logistic regression. Obviously, the proposed method of discovering biomarkers for SPTB can be easily extended for other complex diseases.
Collapse
Affiliation(s)
- Lingyu Li
- Center for Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center for Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
7
|
Infrared Infusion Monitor Based on Data Dimensionality Reduction and Logistics Classifier. Processes (Basel) 2020. [DOI: 10.3390/pr8040437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper presents an infrared infusion monitoring method based on data dimensionality reduction and a logistics classifier. In today’s social environment, nurses with hospital infusion work are under excessive pressure. In order to improve the information level of the traditional medical process, hospitals have introduced a variety of infusion monitoring devices. The current infusion monitoring equipment mainly adopts the detection method of infrared liquid drop detection to realize non-contact measurements. However, a large number of experiments have found that the traditional infrared detection method has the problems of low voltage signal amplitude variation and low signal-to-noise ratio (SNR). Conventional threshold judgment or signal shaping cannot accurately judge whether droplets exist or not, and complex signal processing circuits can greatly increase the cost and power consumption of equipment. In order to solve these problems, this paper proposes a method for the accurate measurement of droplets without increasing the cost, that is, a method combining data drop and a logistics classifier. The dimensionalized data and time information are input into the logistics classifier to judge the drop landing. The test results show that this method can significantly improve the accuracy of droplet judgment without increasing the hardware cost.
Collapse
|
8
|
Hamamoto R, Komatsu M, Takasawa K, Asada K, Kaneko S. Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine. Biomolecules 2019; 10:62. [PMID: 31905969 PMCID: PMC7023005 DOI: 10.3390/biom10010062] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/20/2019] [Accepted: 12/27/2019] [Indexed: 12/14/2022] Open
Abstract
To clarify the mechanisms of diseases, such as cancer, studies analyzing genetic mutations have been actively conducted for a long time, and a large number of achievements have already been reported. Indeed, genomic medicine is considered the core discipline of precision medicine, and currently, the clinical application of cutting-edge genomic medicine aimed at improving the prevention, diagnosis and treatment of a wide range of diseases is promoted. However, although the Human Genome Project was completed in 2003 and large-scale genetic analyses have since been accomplished worldwide with the development of next-generation sequencing (NGS), explaining the mechanism of disease onset only using genetic variation has been recognized as difficult. Meanwhile, the importance of epigenetics, which describes inheritance by mechanisms other than the genomic DNA sequence, has recently attracted attention, and, in particular, many studies have reported the involvement of epigenetic deregulation in human cancer. So far, given that genetic and epigenetic studies tend to be accomplished independently, physiological relationships between genetics and epigenetics in diseases remain almost unknown. Since this situation may be a disadvantage to developing precision medicine, the integrated understanding of genetic variation and epigenetic deregulation appears to be now critical. Importantly, the current progress of artificial intelligence (AI) technologies, such as machine learning and deep learning, is remarkable and enables multimodal analyses of big omics data. In this regard, it is important to develop a platform that can conduct multimodal analysis of medical big data using AI as this may accelerate the realization of precision medicine. In this review, we discuss the importance of genome-wide epigenetic and multiomics analyses using AI in the era of precision medicine.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Masaaki Komatsu
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Takasawa
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Asada
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Syuzo Kaneko
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
| |
Collapse
|