1
|
Fang S, Ni H, Zhang Q, Dai J, He S, Min J, Zhang W, Li H. Integrated single-cell and bulk RNA sequencing analysis reveal immune-related biomarkers in postmenopausal osteoporosis. Heliyon 2024; 10:e38022. [PMID: 39328516 PMCID: PMC11425179 DOI: 10.1016/j.heliyon.2024.e38022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 09/14/2024] [Accepted: 09/16/2024] [Indexed: 09/28/2024] Open
Abstract
Background Postmenopausal osteoporosis (PMOP) represents as a significant health concern, particularly as the population ages. Currently, there is a paucity of comprehensive descriptions regarding the immunoregulatory mechanisms and early diagnostic biomarkers associated with PMOP. This study aims to examine immune-related differentially expressed genes (IR-DEGs) in the peripheral blood mononuclear cells of PMOP patients to identify immunological patterns and diagnostic biomarkers. Methods The GSE56815 dataset from the Gene Expression Omnibus (GEO) database was used as the training group, while the GSE2208 dataset served as the validation group. Initially, differential expression analysis was conducted after data integration to identify IR-DEGs in the peripheral blood mononuclear cells of PMOP. Subsequently, feature selection of these IR-DEGs was performed using RF, SVM-RFE, and LASSO regression models. Additionally, the expression of IR-DEGs in distinct bone marrow cell subtypes was analyzed using single-cell RNA sequencing (scRNA-seq) datasets, allowing the identification of cellular communication patterns within various cell subgroups. Finally, molecular subtypes and diagnostic models for PMOP were constructed based on these selected IR-DEGs. Furthermore, the expression levels of characteristic IR-DEGs were examined in rat osteoporosis (OP) models. Results Using machine learning, six IR-DEGs (JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5) were identified. Subsequently, two molecular subtypes of PMOP (subtype 1 and subtype 2) were established, with subtype 1 exhibiting a higher proportion of M1 macrophage infiltration. Analysis of the scRNA-seq dataset revealed 11 distinct cell clusters. It was noted that JUN was significantly overexpressed in M1 macrophages, while HMOX1 showed a marked elevation in endothelial cells and M2 macrophages. Cell communication results suggested that the PMOP microenvironment features increased interactions among M2 macrophages, CD8+ T cells, Tregs, and fibroblasts. The diagnostic model based on these six IR-DEGs demonstrated excellent diagnostic performance (AUC = 0.927). In the OP rat model, the expression of IL1R2 and TNFSF8 were significantly elevated. Conclusion JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5 may serve as promising molecular targets for diagnosing and subtyping patients with PMOP. These results offer novel perspectives on the early diagnosis of PMOP and the advancement of personalized immune-based therapies.
Collapse
Affiliation(s)
- Shenyun Fang
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| | - Haonan Ni
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
| | - Qianghua Zhang
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| | - Jilin Dai
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| | - Shouyu He
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| | - Jikang Min
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| | - Weili Zhang
- Department of Ophthalmology, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
| | - Haidong Li
- Department of Orthopedic Surgery, First People's Hospital of Huzhou, The First affiliated Hospital of Huzhou University, Huzhou, 313000, China
- Huzhou Key Laboratory for Early Diagnosis and Treatment of Osteoarthritis, Huzhou, 313000, China
| |
Collapse
|
2
|
Yang J, Wang C, Wang Z, Li Y, Yu H, Feng J, Xie S, Li X. Distribution patterns and co-occurrence network of eukaryotic algae in different salinity waters of Yuncheng Salt Lake, China. Sci Rep 2024; 14:8340. [PMID: 38594439 PMCID: PMC11003963 DOI: 10.1038/s41598-024-58636-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 04/01/2024] [Indexed: 04/11/2024] Open
Abstract
The community structure and co-occurrence pattern of eukaryotic algae in Yuncheng Salt Lake were analyzed based on marker gene analysis of the 18S rRNA V4 region to understand the species composition and their synergistic adaptations to the environmental factors in different salinity waters. The results showed indicated that the overall algal composition of Yuncheng Salt Lake showed a Chlorophyta-Pyrrophyta-Bacillariophyta type structure. Chlorophyta showed an absolute advantage in all salinity waters. In addition, Cryptophyta dominated in the least saline waters; Pyrrophyta and Bacillariophyta were the dominant phyla in the waters with salinity ranging from 13.2 to 18%. Picochlorum, Nannochloris, Ulva, and Tetraselmis of Chlorophyta, Biecheleria and Oxyrrhis of Pyrrophyta, Halamphora, Psammothidium, and Navicula of Bacillariophyta, Guillardia and Rhodomonas of Cryptophyta were not observed in previous surveys of the Yuncheng Salt Lake, suggesting that the algae are undergoing a constant turnover as the water environment of the Salt Lake continues to change. The network diagram demonstrated that the algae were strongly influenced by salinity, NO3-, and pH, changes in these environmental factors would lead to changes in the algal community structure, thus affecting the stability of the network structure.
Collapse
Affiliation(s)
- Jing Yang
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China
| | - Chuanxu Wang
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China
| | - Zhuo Wang
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China
| | - Yunjie Li
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China
| | - Huiying Yu
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China
| | - Jia Feng
- School of Life Science, Shanxi University, Taiyuan, 030006, China
| | - Shulian Xie
- School of Life Science, Shanxi University, Taiyuan, 030006, China
| | - Xin Li
- Shanxi Key Laboratory of Yuncheng Salt Lake Ecological Protection and Resource Utilization, College of Life Sciences, Yuncheng University, Yuncheng, 044000, China.
| |
Collapse
|
3
|
Büyükakın F, Özyılmaz A, Işık E, Bayraktar Y, Olgun MF, Toprak M. Pandemics, Income Inequality, and Refugees: The Case of COVID-19. SOCIAL WORK IN PUBLIC HEALTH 2024; 39:78-92. [PMID: 38372287 DOI: 10.1080/19371918.2024.2318372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Refugees are more vulnerable to COVID-19 due to factors such as low standard of living, accommodation in crowded households, difficulty in receiving health care due to high treatment costs in some countries, and inability to access public health and social services. The increasing income inequalities, anxiety about providing minimum living conditions, and fear of being unemployed compel refugees to continue their jobs, and this affects the number of cases and case-related deaths. The aim of the study is to analyze the impact of refugees and income inequality on COVID-19 cases and deaths in 95 countries for the year 2021 using Poisson regression, Negative Binomial Regression, and Machine Learning methods. According to the estimation results, refugees and income inequalities increase both COVID-19 cases and deaths. On the other hand, the impact of income inequality on COVID-19 cases and deaths is stronger than on refugees.
Collapse
Affiliation(s)
- Figen Büyükakın
- Department of Economics, University of Kocaeli, Kocaeli, Turkey
| | - Ayfer Özyılmaz
- Department of Public Fınance, University of Kırıkkale, Kırıkkale, Turkey
| | - Esme Işık
- Department of Optician, Malatya Turgut Özal Unıversıty, Malatya, Turkey
| | | | - Mehmet Firat Olgun
- The Department of Technology Transfer, University of Kastamonu, Kastamonu, Turkey
| | - Metin Toprak
- Department of Economics, Halıc Unıversıty, Istanbul, Turkey
| |
Collapse
|
4
|
Song W, Chen Y, Qin L, Xu X, Sun Y, Zhong M, Lu Y, Hu K, Wei L, Chen J. Oxidative stress drives vascular smooth muscle cell damage in acute Stanford type A aortic dissection through HIF-1α/HO-1 mediated ferroptosis. Heliyon 2023; 9:e22857. [PMID: 38125409 PMCID: PMC10730757 DOI: 10.1016/j.heliyon.2023.e22857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 11/16/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
Background Acute Stanford type A aortic dissection (ATAAD) is characterized by intimal tearing and false lumen formation containing large amounts of erythrocytes with heme. Heme oxygenase 1 (HO-1) is the key enzyme to degrade heme for iron accumulation and further ferroptosis. The current study aimed at investigating the role of HO-1 in the dissection progression of ATAAD. Methods Bioinformatic analyses and experimental validation were performed to reveal ferroptosis and HO-1 expression in ATAAD. Human aortic vascular smooth muscle cell (HA-VSMC) was used to explore underlying molecular mechanisms and the role of HO-1 overexpression in ATAAD. Results Ferroptosis was identified as a critical manner of regulated cell death in ATAAD. HO-1 was screened as a key signature of ferroptosis in ATAAD, which was closely associated with oxidative stress. Single cell/nucleus transcriptomic analysis and histological staining revealed that HO-1 and HIF-1α were upregulated in vascular smooth muscle cell (VSMC) of ATAAD. Further in vitro experiments showed that H2O2-induced oxidative stress increased VSMC ferroptosis with the overexpression of HO-1, which could be suppressed by HIF-1α inhibitor PX-478. HIF-1α could transcriptionally regulate the expression of HO-1 through binding to its promoter region. Pharmacological inhibition of HO-1 by zinc protoporphyrin (ZnPP) did not reduce H2O2-induced HA-VSMC damage without heme co-incubation. However, H2O2-induced HA-VSMC damage was worsened when heme was added into the medium, and ZnPP could reduce HA-VSMC damage in this condition. Conclusion HO-1 is a key signature of VSMC ferroptosis in ATAAD. HIF-1α/HO-1 mediated ferroptosis might participate in oxidative stress induced VSMC damage.
Collapse
Affiliation(s)
- Wenyu Song
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Yifu Chen
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Lieyang Qin
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Xinyuan Xu
- The Second Clinical Medical School, Nanjing Medical University, Nanjing 210029, China
| | - Yu Sun
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mingzhu Zhong
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yuntao Lu
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Kui Hu
- Department of Cardiovascular Surgery, Guizhou Provincial People's Hospital, Guiyang 550002, China
| | - Lai Wei
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Jinmiao Chen
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| |
Collapse
|
5
|
Haq YU, Shahbaz M, Asif S, Ouahada K, Hamam H. Identification of Soil Types and Salinity Using MODIS Terra Data and Machine Learning Techniques in Multiple Regions of Pakistan. SENSORS (BASEL, SWITZERLAND) 2023; 23:8121. [PMID: 37836951 PMCID: PMC10575389 DOI: 10.3390/s23198121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/10/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023]
Abstract
Soil, a significant natural resource, plays a crucial role in supporting various ecosystems and serves as the foundation of Pakistan's economy due to its primary use in agriculture. Hence, timely monitoring of soil type and salinity is essential. However, traditional methods for identifying soil types and detecting salinity are time-consuming, requiring expert intervention and extensive laboratory experiments. The objective of this study is to propose a model that leverages MODIS Terra data to identify soil types and detect soil salinity. To achieve this, 195 soil samples were collected from Lahore, Kot Addu, and Kohat, dating from October 2022 to November 2022. Simultaneously, spectral data of the same regions were obtained to spatially map soil types and salinity of bare land. The spectral reflectance of band values, salinity indices, and vegetation indices were utilized to classify the soil types and predict soil salinity. To perform the classification and regression tasks, the study employed three popular techniques in the research community: Random Forest (RF), Ada Boost (AB), and Gradient Boosting (GB), along with Decision Tree (DT), K-Nearest Neighbor (KNN), and Extra Tree (ET). A 70-30 test train validation split was used for the implementation of these techniques. The efficacy of the multi-class classification models for soil types was evaluated using accuracy, precision, recall, and f1-score. On the other hand, the regression models' performances were evaluated and compared using R-squared (R2), Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). The results demonstrated that Random Forest outperformed other methods for both predicting soil types (accuracy = 65.38, precision = 0.60, recall = 0.57, and f1-score = 0.57) and predicting salinity (R2 = 0.90, MAE = 0.56, MSE = 0.98, RMSE = 0.97). Finally, the study designed a web portal to enable real-time prediction of soil types and salinity using these models. This web portal can be utilized by farmers and decision-makers to make informed decisions regarding soil, crop cultivation, and agricultural planning.
Collapse
Affiliation(s)
- Yasin Ul Haq
- Department of Computer Science, University of Engineering and Technology, Lahore 39161, Pakistan
| | - Muhammad Shahbaz
- Department of Computer Engineering, University of Engineering and Technology, Lahore 39161, Pakistan;
| | - Shahzad Asif
- Department of Computer Science, New Campus, University of Engineering and Technology, Lahore 39161, Pakistan;
| | - Khmaies Ouahada
- Department of Electrical and Electronic Engineering Science, School of Electrical Engineering, University of Johannesburg, Johannesburg 2006, South Africa;
| | - Habib Hamam
- College of Computer Science and Engineering, University of Hail, Hail 55476, Saudi Arabia;
| |
Collapse
|
6
|
Rossi N, Chiaraviglio M, Cardozo G. Behavioural plasticity in activity and sexual interactions in a social lizard at high environmental temperatures. PLoS One 2023; 18:e0285656. [PMID: 37494328 PMCID: PMC10370740 DOI: 10.1371/journal.pone.0285656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 04/27/2023] [Indexed: 07/28/2023] Open
Abstract
Sexual selection often shapes social behavioural activities, such as movement in the environment to find possible partners, performance of displays to signal dominance and courtship behaviours. Such activities may be negatively influenced by increasing temperatures, especially in ectotherms, because individuals either have to withstand the unfavourable condition or are forced to allocate more time to thermoregulation by increasing shelter seeking behaviour. Thus, they "miss" opportunities for social and reproductive interactions. Moreover, behavioural displays of ectotherms closely depend on temperature; consequently, mate choice behaviours may be disrupted, ultimately modifying sexual selection patterns. Therefore, it would be interesting to elucidate how increasing temperatures associated with global warming may influence activity and social interactions in the species' natural habitat and, specifically how high temperatures may modify intersexual interactions. Consequently, our aim was to explore differences in the daily pattern of social interactions in an ectotherm model, Tropidurus spinulosus, in two thermally different habitats and to determine how high temperatures modify mate choice. High environmental temperatures were found to be associated with a bimodal pattern in daily activity, which was closely linked to the daily variations in the thermal quality of the habitat; whereas the pattern and frequency of social displays showed less plasticity. The time allocated to mate choice generally decreased with increasing temperature since individuals increased the use of thermal refuges; this result supports the hypothesis of "missed opportunities". Moreover, at high temperatures, both sexes showed changes in mate selection dynamics, with females possibly "rushing" mate choice and males showing an increase in intermale variability of reproductive displays. In our ectotherm model, plastic adjustments in the behavioural activity pattern induced by high temperatures, plus the modification of the displays during courtship may ultimately modify mate choice patterns and sexual selection dynamics.
Collapse
Affiliation(s)
- Nicola Rossi
- Universidad Nacional de Córdoba, Facultad de Ciencias Exactas Físicas y Naturales, Laboratorio de Biología del Comportamiento, Córdoba, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Instituto de Diversidad y Ecología Animal (IDEA), Córdoba, Argentina
| | - Margarita Chiaraviglio
- Universidad Nacional de Córdoba, Facultad de Ciencias Exactas Físicas y Naturales, Laboratorio de Biología del Comportamiento, Córdoba, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Instituto de Diversidad y Ecología Animal (IDEA), Córdoba, Argentina
| | - Gabriela Cardozo
- Universidad Nacional de Córdoba, Facultad de Ciencias Exactas Físicas y Naturales, Laboratorio de Biología del Comportamiento, Córdoba, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Instituto de Diversidad y Ecología Animal (IDEA), Córdoba, Argentina
| |
Collapse
|
7
|
Huseby CJ, Delvaux E, Brokaw DL, Coleman PD. Blood RNA transcripts reveal similar and differential alterations in fundamental cellular processes in Alzheimer's disease and other neurodegenerative diseases. Alzheimers Dement 2023; 19:2618-2632. [PMID: 36541444 PMCID: PMC11633037 DOI: 10.1002/alz.12880] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 09/30/2022] [Accepted: 10/21/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Dysfunctional processes in Alzheimer's disease and other neurodegenerative diseases lead to neural degeneration in the central and peripheral nervous system. Research demonstrates that neurodegeneration of any kind is a systemic disease that may even begin outside of the region vulnerable to the disease. Neurodegenerative diseases are defined by the vulnerabilities and pathology occurring in the regions affected. METHOD A random forest machine learning analysis on whole blood transcriptomes from six neurodegenerative diseases generated unbiased disease-classifying RNA transcripts subsequently subjected to pathway analysis. RESULTS We report that transcripts of the blood transcriptome selected for each of the neurodegenerative diseases represent fundamental biological cell processes including transcription regulation, degranulation, immune response, protein synthesis, apoptosis, cytoskeletal components, ubiquitylation/proteasome, and mitochondrial complexes that are also affected in the brain and reveal common themes across six neurodegenerative diseases. CONCLUSION Neurodegenerative diseases share common dysfunctions in fundamental cellular processes. Identifying regional vulnerabilities will reveal unique disease mechanisms. HIGHLIGHTS Transcriptomics offer information about dysfunctional processes. Comparing multiple diseases will expose unique malfunctions within diseases. Blood RNA can be used ante mortem to track expression changes in neurodegenerative diseases. Protocol standardization will make public datasets compatible.
Collapse
Affiliation(s)
- Carol J. Huseby
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, Arizona, USA
| | - Elaine Delvaux
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, Arizona, USA
| | - Danielle L. Brokaw
- University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul D. Coleman
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
8
|
Dong H, Yan SB, Li GS, Huang ZG, Li DM, Tang YL, Le JQ, Pan YF, Yang Z, Pan HB, Chen G, Li MJ. Identification through machine learning of potential immune- related gene biomarkers associated with immune cell infiltration in myocardial infarction. BMC Cardiovasc Disord 2023; 23:163. [PMID: 36978012 PMCID: PMC10052851 DOI: 10.1186/s12872-023-03196-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/22/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND To investigate the potential role of immune-related genes (IRGs) and immune cells in myocardial infarction (MI) and establish a nomogram model for diagnosing myocardial infarction. METHODS Raw and processed gene expression profiling datasets were archived from the Gene Expression Omnibus (GEO) database. Differentially expressed immune-related genes (DIRGs), which were screened out by four machine learning algorithms-partial least squares (PLS), random forest model (RF), k-nearest neighbor (KNN), and support vector machine model (SVM) were used in the diagnosis of MI. RESULTS The six key DIRGs (PTGER2, LGR6, IL17B, IL13RA1, CCL4, and ADM) were identified by the intersection of the minimal root mean square error (RMSE) of four machine learning algorithms, which were screened out to establish the nomogram model to predict the incidence of MI by using the rms package. The nomogram model exhibited the highest predictive accuracy and better potential clinical utility. The relative distribution of 22 types of immune cells was evaluated using cell type identification, which was done by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm. The distribution of four types of immune cells, such as plasma cells, T cells follicular helper, Mast cells resting, and neutrophils, was significantly upregulated in MI, while five types of immune cell dispersion, T cells CD4 naive, macrophages M1, macrophages M2, dendritic cells resting, and mast cells activated in MI patients, were significantly downregulated in MI. CONCLUSION This study demonstrated that IRGs were correlated with MI, suggesting that immune cells may be potential therapeutic targets of immunotherapy in MI.
Collapse
Affiliation(s)
- Hao Dong
- Department of Cardiovascular Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Shi-Bai Yan
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Guo-Sheng Li
- Department of Cardiothoracic Surgery, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Zhi-Guang Huang
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Dong-Ming Li
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Yu-Lu Tang
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Jia-Qian Le
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Yan-Fang Pan
- Department of Pathology, Hospital of Guangxi Liugang Medical Co.LTD./Guangxi Liuzhou Dingshun Forensic Expert Institute, No.9, Queershan Rd, Liuzhou, Guangxi Zhuang Autonomous Region, 545002, People's Republic of China
| | - Zhen Yang
- Department of Gerontology, NO.923 Hospital of Chinese People's Liberation Army, No. 1 Tangcheng Rd, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Hong-Bo Pan
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Gang Chen
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China
| | - Ming-Jie Li
- Department of Pathology/ Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, 530021, People's Republic of China.
| |
Collapse
|
9
|
Zeng B, Chen Y, Chen H, Zhao Q, Sun Z, Liu D, Li X, Zhang Y, Wang J, Xing HR. Exosomal miR-211-5p regulates glucose metabolism, pyroptosis, and immune microenvironment of melanoma through GNA15. Pharmacol Res 2023; 188:106660. [PMID: 36642112 DOI: 10.1016/j.phrs.2023.106660] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/26/2022] [Accepted: 01/11/2023] [Indexed: 01/15/2023]
Abstract
Despite the unprecedented advancement of cancer treatment, the prognosis for patients with metastatic stage of cancer remains poor. The challenge that underlines this clinical dilemma is the complexity of metastasis. The conventional experiment-driven discovery approaches (the "wet lab") yield overly simplified one-to-one mechanistic relationships that are inept of elucidating the complexity of metastasis. Metastasis research also suffers from the knowledge and skill deficiency of the individual investigators. The importance of the present study is the demonstration that the "dry-lab-driven discovery and wet-lab validation" approach can improve the efficiency of studying complex biological behaviors, and can yield more reliable, objective and comprehensive mechanistic findings that are have clinical significance. Specifically, we applied this approach to study the mechanisms that underline the involvement of exosomal miRNAs in transferring the metastatic capability between heterogenous melanoma cancer cells. We show that the highly metastatic melanoma tumor cells (POL) can transfer their metastatic competency to the low-metastatic melanoma tumor cells (OL) by exosomal miR-211-5p. The oncogenic activity of miR-211-5p is mediated by the target gene guanine nucleotide-binding protein subunit alpha-15 (GNA15) through modifying the immune function of the tumor microenvironment extrinsically; as well as through inhibiting pyroptosis and augmenting glycolysis within OL cells intrinsically. In addition, we show that exosomal sorting of miR-211-5p is like selective and is subjected to regulation by a transcriptional feedback loop between miR-211-5p and zinc finger FYVE-type containing 26 (ZFYVE26). Furthermore, the "8-genes pyroptosis Risk model" derived from LASSO regression analysis was verified as an independent prognostic factor for melanoma.
Collapse
Affiliation(s)
- Bin Zeng
- Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China.
| | - Yuting Chen
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China.
| | - Hao Chen
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China.
| | - Qiting Zhao
- Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Zhiwei Sun
- Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Doudou Liu
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China
| | - Xiaoshuang Li
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China
| | - Yuhan Zhang
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China
| | - Jianyu Wang
- Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China.
| | - H Rosie Xing
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, Chongqing 400016, China.
| |
Collapse
|
10
|
Cheng N, Guo M, Yan F, Guo Z, Meng J, Ning K, Zhang Y, Duan Z, Han Y, Wang C. Application of machine learning in predicting aggressive behaviors from hospitalized patients with schizophrenia. Front Psychiatry 2023; 14:1016586. [PMID: 37020730 PMCID: PMC10067917 DOI: 10.3389/fpsyt.2023.1016586] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 03/01/2023] [Indexed: 04/07/2023] Open
Abstract
Objective To establish a predictive model of aggressive behaviors from hospitalized patients with schizophrenia through applying multiple machine learning algorithms, to provide a reference for accurately predicting and preventing of the occurrence of aggressive behaviors. Methods The cluster sampling method was used to select patients with schizophrenia who were hospitalized in our hospital from July 2019 to August 2021 as the survey objects, and they were divided into an aggressive behavior group (611 cases) and a non-aggressive behavior group (1,426 cases) according to whether they experienced obvious aggressive behaviors during hospitalization. Self-administered General Condition Questionnaire, Insight and Treatment Attitude Questionnaire (ITAQ), Family APGAR (Adaptation, Partnership, Growth, Affection, Resolve) Questionnaire (APGAR), Social Support Rating Scale Questionnaire (SSRS) and Family Burden Scale of Disease Questionnaire (FBS) were used for the survey. The Multi-layer Perceptron, Lasso, Support Vector Machine and Random Forest algorithms were used to build a predictive model for the occurrence of aggressive behaviors from hospitalized patients with schizophrenia and to evaluate its predictive effect. Nomogram was used to build a clinical application tool. Results The area under the receiver operating characteristic curve (AUC) values of the Multi-Layer Perceptron, Lasso, Support Vector Machine, and Random Forest were 0.904 (95% CI: 0.877-0.926), 0.901 (95% CI: 0.874-0.923), 0.902 (95% CI: 0.876-0.924), and 0.955 (95% CI: 0.935-0.970), where the AUCs of the Random Forest and the remaining three models were statistically different (p < 0.0001), and the remaining three models were not statistically different in pair comparisons (p > 0.5). Conclusion Machine learning models can fairly predict aggressive behaviors in hospitalized patients with schizophrenia, among which Random Forest has the best predictive effect and has some value in clinical application.
Collapse
Affiliation(s)
- Nuo Cheng
- Department of Clinical Medicine, Zhengzhou University, Zhengzhou, Henan, China
| | - Meihao Guo
- Department of Infection Prevention and Control, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Fang Yan
- Department of Infection Prevention and Control, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Zhengjun Guo
- Henan Mental Disease Prevention and Control Center, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Jun Meng
- Editorial Department of Journal of Clinical Psychosomatic Diseases, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Kui Ning
- Department of Medical Administration, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Yanping Zhang
- Department of Medicine, Zhengzhou University, Zhengzhou, Henan, China
| | - Zitian Duan
- The Seventh Psychiatric Department, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
| | - Yong Han
- Henan Key Laboratory of Biological Psychiatry, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
- *Correspondence: Han Yong,
| | - Changhong Wang
- Department of Clinical Psychiatry, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China
- Wang Changhong,
| |
Collapse
|
11
|
Li J, Cui Y, Jin X, Ruan H, He D, Che X, Gao J, Zhang H, Guo J, Zhang J. Significance of pyroptosis-related gene in the diagnosis and classification of rheumatoid arthritis. Front Endocrinol (Lausanne) 2023; 14:1144250. [PMID: 37008939 PMCID: PMC10057543 DOI: 10.3389/fendo.2023.1144250] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 02/17/2023] [Indexed: 03/17/2023] Open
Abstract
BACKGROUND Rheumatoid arthritis (RA), a chronic autoimmune inflammatory disease, is often characterized by persistent morning stiffness, joint pain, and swelling. Early diagnosis and timely treatment of RA can effectively delay the progression of the condition and significantly reduce the incidence of disability. In the study, we explored the function of pyroptosis-related genes (PRGs) in the diagnosis and classification of rheumatoid arthritis based on Gene Expression Omnibus (GEO) datasets. METHOD We downloaded the GSE93272 dataset from the GEO database, which contains 35 healthy controls and 67 RA patients. Firstly, the GSE93272 was normalized by the R software "limma" package. Then, we screened PRGs by SVM-RFE, LASSO, and RF algorithms. To further investigate the prevalence of RA, we established a nomogram model. Besides, we grouped gene expression profiles into two clusters and explored their relationship with infiltrating immune cells. Finally, we analyzed the relationship between the two clusters and the cytokines. RESULT CHMP3, TP53, AIM2, NLRP1, and PLCG1 were identified as PRGs. The nomogram model revealed that decision-making based on established model might be beneficial for RA patients, and the predictive power of the nomogram model was significant. In addition, we identified two different pyroptosis patterns (pyroptosis clusters A and B) based on the 5 PRGs. We found that eosinophil, gamma delta T cell, macrophage, natural killer cell, regulatory T cell, type 17 T helper cell, and type 2 T helper cell were significant high expressed in cluster B. And, we identified gene clusters A and B based on 56 differentially expressed genes (DEGs) between pyroptosis cluster A and B. And we calculated the pyroptosis score for each sample to quantify the different patterns. The patients in pyroptosis cluster B or gene cluster B had higher pyroptosis scores than those in pyroptosis cluster A or gene cluster A. CONCLUSION In summary, PRGs play vital roles in the development and occurrence of RA. Our findings might provide novel views for the immunotherapy strategies with RA.
Collapse
Affiliation(s)
- Jian Li
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Yongfeng Cui
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Xin Jin
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Hongfeng Ruan
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
- Department of Orthopaedics, The First Affiliated Hospital of Zhejiang University of Chinese Medicine, Hangzhou, China
| | - Dongan He
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Xiaoqian Che
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Jiawei Gao
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
| | - Haiming Zhang
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
- *Correspondence: Haiming Zhang, ; Jiandong Guo, ; Jinxi Zhang,
| | - Jiandong Guo
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
- *Correspondence: Haiming Zhang, ; Jiandong Guo, ; Jinxi Zhang,
| | - Jinxi Zhang
- Department of Orthopaedics, Hangzhou Ninth People’s Hospital, Hangzhou, Zhejiang, China
- *Correspondence: Haiming Zhang, ; Jiandong Guo, ; Jinxi Zhang,
| |
Collapse
|
12
|
Huseby CJ, Delvaux E, Brokaw DL, Coleman PD. Blood Transcript Biomarkers Selected by Machine Learning Algorithm Classify Neurodegenerative Diseases including Alzheimer's Disease. Biomolecules 2022; 12:1592. [PMID: 36358942 PMCID: PMC9687215 DOI: 10.3390/biom12111592] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/22/2022] [Accepted: 10/22/2022] [Indexed: 10/15/2023] Open
Abstract
The clinical diagnosis of neurodegenerative diseases is notoriously inaccurate and current methods are often expensive, time-consuming, or invasive. Simple inexpensive and noninvasive methods of diagnosis could provide valuable support for clinicians when combined with cognitive assessment scores. Biological processes leading to neuropathology progress silently for years and are reflected in both the central nervous system and vascular peripheral system. A blood-based screen to distinguish and classify neurodegenerative diseases is especially interesting having low cost, minimal invasiveness, and accessibility to almost any world clinic. In this study, we set out to discover a small set of blood transcripts that can be used to distinguish healthy individuals from those with Alzheimer's disease, Parkinson's disease, Huntington's disease, amyotrophic lateral sclerosis, Friedreich's ataxia, or frontotemporal dementia. Using existing public datasets, we developed a machine learning algorithm for application on transcripts present in blood and discovered small sets of transcripts that distinguish a number of neurodegenerative diseases with high sensitivity and specificity. We validated the usefulness of blood RNA transcriptomics for the classification of neurodegenerative diseases. Information about features selected for the classification can direct the development of possible treatment strategies.
Collapse
Affiliation(s)
- Carol J. Huseby
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, AZ 85281, USA
| | - Elaine Delvaux
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, AZ 85281, USA
| | - Danielle L. Brokaw
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paul D. Coleman
- ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
13
|
Huang HC, Wu Y, Yang Q, Qin LX. PRECISION.array: An R Package for Benchmarking microRNA Array Data Normalization in the Context of Sample Classification. Front Genet 2022; 13:838679. [PMID: 35938023 PMCID: PMC9354575 DOI: 10.3389/fgene.2022.838679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/10/2022] [Indexed: 11/13/2022] Open
Abstract
We present a new R package PRECISION.array for assessing the performance of data normalization methods in connection with methods for sample classification. It includes two microRNA microarray datasets for the same set of tumor samples: a re-sampling-based algorithm for simulating additional paired datasets under various designs of sample-to-array assignment and levels of signal-to-noise ratios and a collection of numerical and graphical tools for method performance assessment. The package allows users to specify their own methods for normalization and classification, in addition to implementing three methods for training data normalization, seven methods for test data normalization, seven methods for classifier training, and two methods for classifier validation. It enables an objective and systemic evaluation of the operating characteristics of normalization and classification methods in microRNA microarrays. To our knowledge, this is the first such tool available. The R package can be downloaded freely at https://github.com/LXQin/PRECISION.array.
Collapse
|
14
|
Landslide Susceptibility Research Combining Qualitative Analysis and Quantitative Evaluation: A Case Study of Yunyang County in Chongqing, China. FORESTS 2022. [DOI: 10.3390/f13071055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Machine learning-based methods are commonly used for landslide susceptibility mapping. Most of the recent publications focused on quantitative analysis, i.e., improving data processing methods, comparing and perfecting the data-driven model itself, but rarely taking the qualitative aspects of the local landslide occurrences into consideration and the further analysis of the key features was always lacking. This study aims to combine qualitative and quantitative analysis and examine its effect on mapping accuracy; based on the feature importance ranks and the related literature, the key features for identifying landslide/non-landslide points of different sub-zones were further analyzed. Before modeling, the study area Yunyang County, Chongqing City, China, was manually divided into four sub-zones based on the information from geological hazards exploration in Chongqing, including the mechanism of landslide formation and sliding failure and geomorphic unit characteristics. Upon the qualitative analysis basis, five grid searches tuned random forest models (one for the whole region and four for the sub-zones independently) were established by 1654 data points and 20 conditioning features. Compared with the conventional data-driven method, the integrated quantitative evaluation based on the qualitative analysis results showed higher reliability, which not only improved the mapping accuracy but also increased the AUC values of all four sub-models, which were 8.8%, 2.3%, 1.9% and 9.1% higher than that of the parent model. Moreover, the quantitative evaluation based on the qualitative analysis revealed the key factors affecting local landslide formation. Therefore, qualitative analysis is recommended in future landslide susceptibility modeling with the additional combination of data-driven methods.
Collapse
|
15
|
Association between Arsenic Level, Gene Expression in Asian Population, and In Vitro Carcinogenic Bladder Tumor. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2022; 2022:3459855. [PMID: 35039759 PMCID: PMC8760535 DOI: 10.1155/2022/3459855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/25/2021] [Indexed: 11/18/2022]
Abstract
The IARC classified arsenic (As) as "carcinogenic to humans." Despite the health consequences of arsenic exposure, there is no molecular signature available yet that can predict when exposure may lead to the development of disease. To understand the molecular processes underlying arsenic exposure and the risk of disease development, this study investigated the functional relationship between high arsenic exposure and disease risk using gene expression derived from human exposure. In this study, a three step analysis was employed: (1) the gene expression profiles obtained from two diverse arsenic-exposed Asian populations were utilized to identify differentially expressed genes associated with arsenic exposure in human subjects, (2) the gene expression profiles induced by arsenic exposure in four different myeloma cancer cell lines were used to define common genes and pathways altered by arsenic exposure, and (3) the genetic profiles of two publicly available human bladder cancer studies were used to test the significance of the common association of genes, identified in step 1 and step 2, to develop and validate a predictive model of primary bladder cancer risk associated with arsenic exposure. Our analysis shows that arsenic exposure to humans is mainly associated with organismal injury and abnormalities, immunological disease, inflammatory disease, gastrointestinal disease, and increased rates of a wide variety of cancers. In addition, arsenic exerts its toxicity by generating reactive oxygen species (ROS) and increasing ROS production causing the imbalance that leads to cell and tissue damage (oxidative stress). Oxidative stress activates inflammatory pathways leading to transformation of a normal cell to tumor cell specifically; there is significant evidence of the advancing changes in oxidative/nitrative stress during the progression of bladder cancer. Therefore, we examined the relation of differentially expressed genes due to exposure of arsenic in human and bladder cancer and developed a bladder cancer risk prediction model. In this study, integrin-linked kinase (ILK) was one of the most significant pathways identified between both arsenic exposed population which plays a key role in eliciting a protective response to oxidative damage in epidermal cells. On the other hand, several studies showed that arsenic trioxide (ATO) is useful for anticancer therapy although the mechanisms underlying its paradoxical effects are still not well understood. ATO has shown remarkable efficacy for the treatment of multiple myeloma; therefore, it will be helpful to understand the underlying cancer biology by which ATO exerts its inhibitory effect on the myeloma cells. Our study found that MAPK is one of the most active network between arsenic gene and ATO cell line which is involved in indicative of oxidative/nitrosative damage and well associated with the development of bladder cancer. The study identified a unique set of 147 genes associated with arsenic exposure and linked to molecular mechanisms of cancer. The risk prediction model shows the highest prediction ability for recurrent bladder tumors based on a very small subset (NKIRAS2, AKTIP, and HLA-DQA1) of the 147 genes resulting in AUC of 0.94 (95% CI: 0.744-0.995) and 0.75 (95% CI: 0.343-0.933) on training and validation data, respectively.
Collapse
|
16
|
Pan C, Deng D, Wei T, Wu Z, Zhang B, Yuan Q, Liang G, Liu Y, Yin P. Metabolomics study identified bile acids as potential biomarkers for gastric cancer: A case control study. Front Endocrinol (Lausanne) 2022; 13:1039786. [PMID: 36465663 PMCID: PMC9715751 DOI: 10.3389/fendo.2022.1039786] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/03/2022] [Indexed: 11/21/2022] Open
Abstract
Gastric cancer (GC) is a common lethal malignancy worldwide. Gastroscopy is an effective screening technique for decreasing mortality. However, there are still limited useful non-invasive markers for early detection of GC. Bile acids are important molecules for the modulation of energy metabolism. With an in-depth targeted method for accurate quantitation of 80 bile acids (BAs), we aimed to find potential biomarkers for the early screening of GC. A cohort with 280 participants was enrolled, including 113 GC, 22 benign gastric lesions (BGL) and 145 healthy controls. Potential markers were identified using a random forest machine algorithm in the discovery cohort (n=180), then validated in an internal validation cohort (n=78) and a group with 22 BGL. The results represented significant alterations in the circulating BA pool between GC and the controls. BAs also exhibited significant correlations with various clinical traits. Then, we developed a diagnostic panel that comprised six BAs or ratios for GC detection. The panel showed high accuracy for the diagnosis of GC with AUC of 1 (95%CI: 1.00-1.00) and 0.98 (95%CI: 0.93-1.00) in the discovery and validation cohort, respectively. This 6-BAs panel was also able to identify early GC with AUC of 1 (95%CI: 0.999-1.00) and 0.94 (95%CI: 0.83-1.00) in the discovery and validation cohort, respectively. Meanwhile, this panel achieved a good differential diagnosis between GC and BGL and the AUC was 0.873 (95%CI: 0.812-0.934). The alternations of serum bile acids are characteristic metabolic features of GC. Bile acids could be promising biomarkers for the early diagnosis of GC.
Collapse
Affiliation(s)
- Chen Pan
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
- Clinical Laboratory of Integrative Medicine, First Affiliated Hospital of Dalian Medical University, Dalian, China
- Department of General Surgery, The First Affiliated Hospital of University of Science and Technology of China (USTC), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Dawei Deng
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
- Clinical Laboratory of Integrative Medicine, First Affiliated Hospital of Dalian Medical University, Dalian, China
- Department of Hepato-Biliary-Pancreas, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Tianfu Wei
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Zeming Wu
- iPhenome Biotechnology (Yun Pu Kang) Inc., Dalian, China
| | - Biao Zhang
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Qihang Yuan
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Guogang Liang
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Yanfeng Liu
- Department of General Surgery, First Affiliated Hospital of Dalian Medical University, Dalian, China
- *Correspondence: Yanfeng Liu, ; Peiyuan Yin,
| | - Peiyuan Yin
- Clinical Laboratory of Integrative Medicine, First Affiliated Hospital of Dalian Medical University, Dalian, China
- Institute of Integrative Medicine, Dalian Medical University, Dalian, China
- *Correspondence: Yanfeng Liu, ; Peiyuan Yin,
| |
Collapse
|
17
|
Rostami MA, Frontalini F, Giordano P, Francescangeli F, Alves Martins MV, Dyer L, Spagnoli F. Testing the applicability of random forest modeling to examine benthic foraminiferal responses to multiple environmental parameters. MARINE ENVIRONMENTAL RESEARCH 2021; 172:105502. [PMID: 34638002 DOI: 10.1016/j.marenvres.2021.105502] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/25/2021] [Accepted: 10/07/2021] [Indexed: 06/13/2023]
Abstract
The main environmental variables controlling benthic foraminiferal distributions were identified and used to assess their influence on ecological indices developed as predictors of Ecological Quality Status (EcoQS) in marine ecosystems. Gradient forest and random forest models were applied to assess the predictive value of a selection of abiotic (environmental) and biotic (foraminifera) variables in a costal marine area in the central Adriatic Sea (Italy). This approach yields evidence that the predictor variables sand, silt, Pollution Load Index, and TN have the greatest influence on the distribution of benthic foraminifera in this area. In addition, we identify thresholds for the most important environmental variables that influence ecological indices. These findings contribute to efforts to determine how to best improve sediment quality and environmental stability for marine conservation. Further application of these approaches represents a useful tool for policymakers to survey the diversity of marine organisms and to improve the ability to protect and restore marine ecosystems by identifying predictors of diversity and identifying key thresholds in these predictors.
Collapse
Affiliation(s)
- Masoud A Rostami
- Department of Biology, University of Nevada, Reno, Reno, NV, 89557, USA.
| | - Fabrizio Frontalini
- Department of Pure and Applied Sciences, Università degli Studi di Urbino "Carlo Bo", 61029, Urbino, Italy
| | - Patrizia Giordano
- Istituto di Scienze Polari, Consiglio Nazionale delle Ricerche, 40129, Bologna, Italy
| | - Fabio Francescangeli
- University of Hamburg, Institute for Geology, Centre for Earth System Research and Sustainability, Bundesstraße, 55, 20146, Hamburg, Germany
| | - Maria Virginia Alves Martins
- Rio de Janeiro State University (UERJ), R. São Francisco Xavier, 524, LabMicro 4037F, Maracanã, Rio de Janeiro, 20550-900, Brazil; Aveiro University, Department of Geosciences, GeoBioTec, Campus de Santiago, 3810-197, Aveiro, Portugal
| | - Lee Dyer
- Department of Biology, University of Nevada, Reno, Reno, NV, 89557, USA
| | - Federico Spagnoli
- Istituto per le Risorse Biologiche e le Biotecnologie Marine, Consiglio Nazionale delle Ricerche, 60125, Ancona, Italy; School of Science and Technology, Geology division, University of Camerino, 62032, Camerino, Italy
| |
Collapse
|
18
|
Li Y, Pan J, Zhou N, Fu D, Lian G, Yi J, Peng Y, Liu X. A random forest model predicts responses to infliximab in Crohn's disease based on clinical and serological parameters. Scand J Gastroenterol 2021; 56:1030-1039. [PMID: 34304688 DOI: 10.1080/00365521.2021.1939411] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Infliximab (IFX) has revolutionised the treatment for Crohn's disease (CD) recently, while a part of patients show no response to it at the end of the induction period. We developed a random forest-based prediction tool to predict the response to IFX in CD patients. METHODS This observational study retrospectively enrolled the patients diagnosed with active CD and received IFX treatment at the Gastroenterology Department in Xiangya Hospital of Central South University between January 2017 and December 2019. The baseline data were recorded in the beginning and were used as predictor variables to construct models to forecast the outcome of the response to IFX. RESULTS Our cohort identified a total of 174 patients finally with a response rate of 29.3% (51/174). The area under the receiver operating characteristic curve (AUC) for the model, based on the random forest was 0.90 (95%CI: 0.82-0.98), compared to the logistic regression model with AUC of 0.68 (95%CI: 0.52-0.85). The optimal cut-off value of the random forest model was 0.34 with the specificity of 0.94, the sensitivity of 0.81 and the accuracy of 0.85. We demonstrated a strong association of IFX response with the levels of complement C3 (C3), high density lipoprotein, serum albumin, Controlling Nutritional Status (CONUT) score and visceral fat area/subcutaneous fat area ratio (VSR). CONCLUSION A novel random forest model using the clinical and serological parameters of baseline data was established to identify CD patients with baseline inflammation to achieve IFX response. This model could be valuable for physicians, patients and insurers, which allows individualised therapy.
Collapse
Affiliation(s)
- Yong Li
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Jianfeng Pan
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Nan Zhou
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Dongni Fu
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Guanghui Lian
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Jun Yi
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Yu Peng
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China
| | - Xiaowei Liu
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, China.,Hunan International Scientific and Technological Cooperation Base of Artificial Intelligence Computer Aided Diagnosis and Treatment for Digestive Disease, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
19
|
Soil Organic Matter Prediction Model with Satellite Hyperspectral Image Based on Optimized Denoising Method. REMOTE SENSING 2021. [DOI: 10.3390/rs13122273] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In order to improve the signal-to-noise ratio of the hyperspectral sensors and exploit the potential of satellite hyperspectral data for predicting soil properties, we took MingShui County as the study area, which the study area is approximately 1481 km2, and we selected Gaofen-5 (GF-5) satellite hyperspectral image of the study area to explore an applicable and accurate denoising method that can effectively improve the prediction accuracy of soil organic matter (SOM) content. First, fractional-order derivative (FOD) processing is performed on the original reflectance (OR) to evaluate the optimal FOD. Second, singular value decomposition (SVD), Fourier transform (FT) and discrete wavelet transform (DWT) are used to denoise the OR and optimal FOD reflectance. Third, the spectral indexes of the reflectance under different denoising methods are extracted by optimal band combination algorithm, and the input variables of different denoising methods are selected by the recursive feature elimination (RFE) algorithm. Finally, the SOM content is predicted by a random forest prediction model. The results reveal that 0.6-order reflectance describes more useful details in satellite hyperspectral data. Five spectral indexes extracted from the reflectance under different denoising methods have a strong correlation with the SOM content, which is helpful for realizing high-accuracy SOM predictions. All three denoising methods can reduce the noise in hyperspectral data, and the accuracies of the different denoising methods are ranked DWT > FT > SVD, where 0.6-order-DWT has the highest accuracy (R2 = 0.84, RMSE = 3.36 g kg−1, and RPIQ = 1.71). This paper is relatively novel, in that GF-5 satellite hyperspectral data based on different denoising methods are used to predict SOM, and the results provide a highly robust and novel method for mapping the spatial distribution of SOM content at the regional scale.
Collapse
|
20
|
Peng J, Duan Z, Guo Y, Li X, Luo X, Han X, Luo J. Identification of candidate biomarkers of liver hydatid disease via microarray profiling, bioinformatics analysis, and machine learning. J Int Med Res 2021; 49:300060521993980. [PMID: 33787392 PMCID: PMC8020228 DOI: 10.1177/0300060521993980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Objectives Liver echinococcosis is a severe zoonotic disease caused by
Echinococcus (tapeworm) infection, which is epidemic in
the Qinghai region of China. Here, we aimed to explore biomarkers and
establish a predictive model for the diagnosis of liver echinococcosis. Methods Microarray profiling followed by Gene Ontology and Kyoto Encyclopedia of
Genes and Genomes analysis was performed in liver tissue from patients with
liver hydatid disease and from healthy controls from the Qinghai region of
China. A protein–protein interaction (PPI) network and random forest model
were established to identify potential biomarkers and predict the occurrence
of liver echinococcosis, respectively. Results Microarray profiling identified 1152 differentially expressed genes (DEGs),
including 936 upregulated genes and 216 downregulated genes. Several
previously unreported biological processes and signaling pathways were
identified. The FCGR2B and CTLA4 proteins were identified by the PPI
networks and random forest model. The random forest model based on FCGR2B
and CTLA4 reliably predicted the occurrence of liver hydatid disease, with
an area under the receiver operator characteristic curve of 0.921. Conclusion Our findings give new insight into gene expression in patients with liver
echinococcosis from the Qinghai region of China, improving our understanding
of hepatic hydatid disease.
Collapse
Affiliation(s)
- Jinwu Peng
- Department of Pathology, Xiangya Basic Medical School, Central South University, Changsha, Hunan, China.,Department of Pathology, Xiangya Changde Hospital, Changde, Hunan, China.,Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Zhili Duan
- Department of Pathology, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| | - Yamin Guo
- Department of General Surgery, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| | - Xiaona Li
- Department of Pathology, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| | - Xiaoqin Luo
- Department of Pathology, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| | - Xiumin Han
- Department of General Surgery, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| | - Junming Luo
- Department of Pathology, Xiangya Changde Hospital, Changde, Hunan, China.,Department of Pathology, Qinghai Provincial People's Hospital, Xining, Qinghai, China
| |
Collapse
|
21
|
Bioinformatics identification of lncRNA biomarkers associated with the progression of esophageal squamous cell carcinoma. Mol Med Rep 2019; 19:5309-5320. [PMID: 31059058 PMCID: PMC6522958 DOI: 10.3892/mmr.2019.10213] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 03/18/2019] [Indexed: 12/31/2022] Open
Abstract
The poor outcome of patients with esophageal squamous cell carcinoma (ESCC) highlights the importance of the identification of novel effective prognostic biomarkers. Long non-coding RNAs (lncRNAs) serve regulatory roles in various types of cancer. The aim of the present study was to investigate the lncRNA expression profile in ESCC and to identify lncRNAs associated with the prognosis of ESCC by performing comprehensive bioinformatics analyses. The RNA-sequencing (Seq) expression dataset GSE53625 generated from ESCC samples was used as a training dataset. Additional RNA-Seq datasets relative to ESCC samples were downloaded from The Cancer Genome Atlas and used as a validation dataset. Data were screened using the limma package, and differentially expressed lncRNAs between early- and late-stage ESCC were identified. A random forest algorithm was used to select the optimal lncRNA biomarkers, which were then analyzed using the support vector machine (SVM) algorithm with R software. The identified lncRNA biomarkers were examined in the validation dataset by bidirectional hierarchical clustering and using an SVM classifier. Subsequently, univariate and multivariate Cox regression analyses were performed to analyze the potential ability lncRNAs to predict the survival rate of patients with ESCC. By examining the training group, 259 deregulated lncRNAs between early- and advanced-stage ESCC were identified. Further bioinformatics analyses identified a nine-lncRNA signature, including AC098973, AL133493, RP11-51M24, RP11-317N8, RP11-834C11, RP11-69C17, LINC00471, LINC01193 and RP1-124C. This nine-lncRNA signature was used to predict the tumor stage and patient survival rate with high reliability and accuracy in the training and validation datasets. Furthermore, these nine lncRNA biomarkers were primarily involved in regulating the cell cycle and DNA replication, and these processes were previously identified to be associated with the progression of ESCC. The identified nine-lncRNA signature was identified to be associated with the tumor stage, and could be used as predictor of the survival rate of patients with ESCC.
Collapse
|
22
|
Which spatial distribution model best predicts the occurrence of dominant species in semi-arid rangeland of northern Iran? ECOL INFORM 2019. [DOI: 10.1016/j.ecoinf.2018.12.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Thampi BV, Wong T, Lukashin C, Loeb NG. Determination of CERES TOA fluxes using Machine learning algorithms. Part I: Classification and retrieval of CERES cloudy and clear scenes. JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY 2017; 34:2329-2345. [PMID: 33505104 PMCID: PMC7837512 DOI: 10.1175/jtech-d-16-0183.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Continuous monitoring of the Earth radiation budget (ERB) is critical to our understanding of the Earth's climate and its variability with time. The Clouds and the Earth's Radiant Energy System (CERES) instrument is able to provide a long record of ERB for such scientific studies. This manuscript, which is first of a two-part paper, describes the new CERES algorithm for improving the clear/cloudy scene classification without the use of coincident cloud imager data. This new CERES algorithm is based on a subset of modern artificial intelligence (AI) paradigm called Machine Learning (ML) algorithms. This paper describes development and application of the ML algorithm known as Random Forests (RF) which is used to classify CERES broadband footprint measurements into clear and cloudy scenes. Results from the RF analysis carried using the CERES Single Scanner Footprint (SSF) data for the months of January and July are presented in the manuscript. The daytime RF misclassification rate (MCR) shows relatively large values (>30%) for snow, sea ice and bright desert surface types while lower values of (<10%) for forest surface type. MCR values observed for the nighttime data in general show relatively larger values for most of the surface types compared to the daytime MCR values. The modified MCR values show lower values (< 4%) for most surface types after thin cloud data is excluded from the analysis. Sensitivity analysis shows that the number of input variables and decision trees used in the RF analysis has substantial influence in determining the classification error.
Collapse
Affiliation(s)
| | - Takmeng Wong
- NASA Langley Research Centre, Hampton, VA, USA 23681
| | | | - Norman G Loeb
- NASA Langley Research Centre, Hampton, VA, USA 23681
| |
Collapse
|
24
|
Zhao K, Jing X, Sanders NJ, Chen L, Shi Y, Flynn DFB, Wang Y, Chu H, Liang W, He J. On the controls of abundance for soil‐dwelling organisms on the Tibetan Plateau. Ecosphere 2017. [DOI: 10.1002/ecs2.1901] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Ke Zhao
- Department of Ecology College of Urban and Environmental Sciences, and Key Laboratory for Earth Surface Processes of the Ministry of Education Peking University 5 Yiheyuan Road Beijing 100871 China
| | - Xin Jing
- Department of Ecology College of Urban and Environmental Sciences, and Key Laboratory for Earth Surface Processes of the Ministry of Education Peking University 5 Yiheyuan Road Beijing 100871 China
| | - Nathan J. Sanders
- Rubenstein School of Environment and Natural Resources University of Vermont Burlington Vermont 05405 USA
| | - Litong Chen
- Key Laboratory of Adaptation and Evolution of Plateau Biota Northwest Institute of Plateau Biology Chinese Academy of Sciences 23 Xinning Road Xining 810008 China
| | - Yu Shi
- State Key Laboratory of Soil and Sustainable Agriculture Institute of Soil Science Chinese Academy of Sciences Nanjing 210008 China
| | - Dan F. B. Flynn
- The Arnold Arboretum of Harvard University 1300 Centre Street Boston Massachusetts 02131 USA
| | - Yonghui Wang
- Department of Ecology College of Urban and Environmental Sciences, and Key Laboratory for Earth Surface Processes of the Ministry of Education Peking University 5 Yiheyuan Road Beijing 100871 China
| | - Haiyan Chu
- State Key Laboratory of Soil and Sustainable Agriculture Institute of Soil Science Chinese Academy of Sciences Nanjing 210008 China
| | - Wenju Liang
- State Key Laboratory of Forest and Soil Ecology Institute of Applied Ecology Chinese Academy of Sciences Shenyang 110164 China
| | - Jin‐Sheng He
- Department of Ecology College of Urban and Environmental Sciences, and Key Laboratory for Earth Surface Processes of the Ministry of Education Peking University 5 Yiheyuan Road Beijing 100871 China
- Key Laboratory of Adaptation and Evolution of Plateau Biota Northwest Institute of Plateau Biology Chinese Academy of Sciences 23 Xinning Road Xining 810008 China
| |
Collapse
|
25
|
Pellatt DF, Stevens JR, Wolff RK, Mullany LE, Herrick JS, Samowitz W, Slattery ML. Expression Profiles of miRNA Subsets Distinguish Human Colorectal Carcinoma and Normal Colonic Mucosa. Clin Transl Gastroenterol 2016; 7:e152. [PMID: 26963002 PMCID: PMC4822091 DOI: 10.1038/ctg.2016.11] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 02/03/2016] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVES MicroRNAs (miRNAs) are small, non-protein-coding RNA molecules that are commonly dysregulated in colorectal tumors. The objective of this study was to identify smaller subsets of highly predictive miRNAs. METHODS Data come from population-based studies of colorectal cancer conducted in Utah and the Kaiser Permanente Medical Care Program. Tissue samples were available for 1,953 individuals, of which 1,894 had carcinoma tissue and 1,599 had normal mucosa available for statistical analysis. Agilent Human miRNA Microarray V.19.0 was used to generate miRNA expression profiles; validation of expression levels was carried out using quantitative PCR. We used random forest analysis and verified findings with logistic modeling in separate data sets. Important microRNAs are identified and bioinformatics tools are used to identify target genes and related biological pathways. RESULTS We identified 16 miRNAs for colon and 17 miRNAs for rectal carcinoma that appear to differentiate between carcinoma and normal mucosa; of these, 12 were important for both colon and rectal cancer, hsa-miR-663b, hsa-miR-4539, hsa-miR-17-5p, hsa-miR-20a-5p, hsa-miR-21-5p, hsa-miR-4506, hsa-miR-92a-3p, hsa-miR-93-5p, hsa-miR-145-5p, hsa-miR-3651, hsa-miR-378a-3p, and hsa-miR-378i. Estimated misclassification rates were low at 4.83% and 2.5% among colon and rectal observations, respectively. Among independent observations, logistic modeling reinforced the importance of these miRNAs, finding the primary principal components of their variation statistically significant (P<0.001 among both colon and rectal observations) and again producing low misclassification rates. Repeating our analysis without those miRNAs initially identified as important identified other important miRNAs; however, misclassification rates increased and distinctions between remaining miRNAs in terms of classification importance were reduced. CONCLUSIONS Our data support the hypothesis that while many miRNAs are dysregulated between carcinoma and normal mucosa, smaller subsets of these miRNAs are useful and informative in discriminating between these tissues.
Collapse
Affiliation(s)
- Daniel F Pellatt
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
| | - John R Stevens
- Department of Mathematics and Statistics, Utah State University, Logan Utah, USA
| | - Roger K Wolff
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Lila E Mullany
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Jennifer S Herrick
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Wade Samowitz
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| | - Martha L Slattery
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
26
|
Amaratunga D, Cabrera J, Lee YS. Resampling-based similarity measures for high-dimensional data. J Comput Biol 2015; 22:54-62. [PMID: 25493697 DOI: 10.1089/cmb.2014.0195] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
An important issue in classification is the assessment of sample similarity. This is nontrivial in high-dimensional or megavariate datasets--datasets that are comprised of simultaneous measurements on thousands of features, many of which carry little or no information regarding consistent sample differences. Conventional similarity measures do not work particularly well for such data. As an alternative, we propose a distance measure that is based on a refiltering process: at each step of the process a random subset of features is selected and a cluster analysis is performed using only this subset; the relative frequency with which a pair of samples clusters together across several such random subsets forms the similarity measure. The features chosen at any step may be completely random or enriched by awarding the more informative features a higher chance of selection; this enrichment turns out to be particularly effective. We use actual datasets from the burgeoning genomics literature to demonstrate the superior performance of this similarity measure, especially the enriched form of the similarity measure, compared to more conventional measures such as Euclidean distance or correlation, or, if the data are categorical, Hamming distance.
Collapse
|
27
|
Philibert A, Loyce C, Makowski D. Prediction of N2O emission from local information with Random Forest. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2013; 177:156-163. [PMID: 23500053 DOI: 10.1016/j.envpol.2013.02.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Revised: 01/25/2013] [Accepted: 02/08/2013] [Indexed: 06/01/2023]
Abstract
Nitrous oxide is a potent greenhouse gas, with a global warming potential 298 times greater than that of CO2. In agricultural soils, N2O emissions are influenced by a large number of environmental characteristics and crop management techniques that are not systematically reported in experiments. Random Forest (RF) is a machine learning method that can handle missing data and ranks input variables on the basis of their importance. We aimed to predict N2O emission on the basis of local information, to rank environmental and crop management variables according to their influence on N2O emission, and to compare the performances of RF with several regression models. RF outperformed the regression models for predictive purposes, and this approach led to the identification of three important input variables: N fertilization, type of crop, and experiment duration. This method could be used in the future for prediction of N2O emissions from local information.
Collapse
Affiliation(s)
- Aurore Philibert
- AgroParisTech, UMR 211 Agronomie, F-78000 Thiverval Grignon, France
| | | | | |
Collapse
|
28
|
YAN ZHI, LI JIANGENG, XIONG YIMIN, XU WEITIAN, ZHENG GUORONG. Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data. Oncol Rep 2012; 28:1036-42. [DOI: 10.3892/or.2012.1891] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Accepted: 06/08/2012] [Indexed: 11/05/2022] Open
|
29
|
Simpson GL, Birks HJB. Statistical Learning in Palaeolimnology. TRACKING ENVIRONMENTAL CHANGE USING LAKE SEDIMENTS 2012. [DOI: 10.1007/978-94-007-2745-8_9] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
30
|
Wu J, Jiang C, Houston D, Baker D, Delfino R. Automated time activity classification based on global positioning system (GPS) tracking data. Environ Health 2011; 10:101. [PMID: 22082316 PMCID: PMC3256108 DOI: 10.1186/1476-069x-10-101] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Accepted: 11/14/2011] [Indexed: 05/22/2023]
Abstract
BACKGROUND Air pollution epidemiological studies are increasingly using global positioning system (GPS) to collect time-location data because they offer continuous tracking, high temporal resolution, and minimum reporting burden for participants. However, substantial uncertainties in the processing and classifying of raw GPS data create challenges for reliably characterizing time activity patterns. We developed and evaluated models to classify people's major time activity patterns from continuous GPS tracking data. METHODS We developed and evaluated two automated models to classify major time activity patterns (i.e., indoor, outdoor static, outdoor walking, and in-vehicle travel) based on GPS time activity data collected under free living conditions for 47 participants (N = 131 person-days) from the Harbor Communities Time Location Study (HCTLS) in 2008 and supplemental GPS data collected from three UC-Irvine research staff (N = 21 person-days) in 2010. Time activity patterns used for model development were manually classified by research staff using information from participant GPS recordings, activity logs, and follow-up interviews. We evaluated two models: (a) a rule-based model that developed user-defined rules based on time, speed, and spatial location, and (b) a random forest decision tree model. RESULTS Indoor, outdoor static, outdoor walking and in-vehicle travel activities accounted for 82.7%, 6.1%, 3.2% and 7.2% of manually-classified time activities in the HCTLS dataset, respectively. The rule-based model classified indoor and in-vehicle travel periods reasonably well (Indoor: sensitivity > 91%, specificity > 80%, and precision > 96%; in-vehicle travel: sensitivity > 71%, specificity > 99%, and precision > 88%), but the performance was moderate for outdoor static and outdoor walking predictions. No striking differences in performance were observed between the rule-based and the random forest models. The random forest model was fast and easy to execute, but was likely less robust than the rule-based model under the condition of biased or poor quality training data. CONCLUSIONS Our models can successfully identify indoor and in-vehicle travel points from the raw GPS data, but challenges remain in developing models to distinguish outdoor static points and walking. Accurate training data are essential in developing reliable models in classifying time-activity patterns.
Collapse
Affiliation(s)
- Jun Wu
- Program in Public Health, University of California, Irvine, USA
- Department of Epidemiology, School of Medicine, University of California, Irvine, USA
| | | | - Douglas Houston
- Department of Planning, Policy and Design, School of Social Ecology, University of California, Irvine, USA
| | - Dean Baker
- Center for Occupational & Environmental Health, University of California, Irvine, USA
| | - Ralph Delfino
- Department of Epidemiology, School of Medicine, University of California, Irvine, USA
| |
Collapse
|
31
|
Abstract
The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.
Collapse
|
32
|
Leaper R, Hill NA, Edgar GJ, Ellis N, Lawrence E, Pitcher CR, Barrett NS, Thomson R. Predictions of beta diversity for reef macroalgae across southeastern Australia. Ecosphere 2011. [DOI: 10.1890/es11-00089.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
33
|
An attempt to use ectoparasites as tags for habitat occupancy by small mammalian hosts in central Europe: effects of host gender, parasite taxon and season. Parasitology 2011; 138:609-18. [PMID: 21320388 DOI: 10.1017/s0031182011000102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECTIVE We used data on fleas and gamasid mites parasitic on 8 species of small mammals to test whether (a) species composition of ectoparasite infracommunities may be used to predict host habitat occupancy and (b) the accuracy of this prediction differs between ectoparasite taxa, host genders and seasons. METHODS We used a Random Forests algorithm that is based on the methodology of classification trees. RESULTS The accuracy of prediction of habitat occupancy was relatively low and varied substantially among host species. The combined rate of the correct prediction of host habitat occupancy from data on ectoparasites was significantly higher than 50%, albeit being relatively low. The accuracy of prediction (a) did not differ between male and female hosts when it was based on species composition of fleas in summer or of mites in summer and winter, (b) was significantly higher in male hosts than in female hosts when the winter data on fleas were used and (c) was significantly higher for flea than mite assemblages. The effect of season was found in mites but not in fleas with the accuracy of prediction being significantly higher in summer than in winter assemblages. CONCLUSIONS Ectoparasites appeared to be not especially useful as biological markers for distinguishing host populations in different habitats in temperate zones.
Collapse
|
34
|
Wei CL, Rowe GT, Escobar-Briones E, Boetius A, Soltwedel T, Caley MJ, Soliman Y, Huettmann F, Qu F, Yu Z, Pitcher CR, Haedrich RL, Wicksten MK, Rex MA, Baguley JG, Sharma J, Danovaro R, MacDonald IR, Nunnally CC, Deming JW, Montagna P, Lévesque M, Weslawski JM, Wlodarska-Kowalczuk M, Ingole BS, Bett BJ, Billett DSM, Yool A, Bluhm BA, Iken K, Narayanaswamy BE. Global patterns and predictions of seafloor biomass using random forests. PLoS One 2010; 5:e15323. [PMID: 21209928 PMCID: PMC3012679 DOI: 10.1371/journal.pone.0015323] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 11/08/2010] [Indexed: 11/25/2022] Open
Abstract
A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM), seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes). Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC) flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management.
Collapse
Affiliation(s)
- Chih-Lin Wei
- Department of Oceanography, Texas A&M University, College Station, Texas, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Himes BE, Wu AC, Duan QL, Klanderman B, Litonjua AA, Tantisira K, Ramoni MF, Weiss ST. Predicting response to short-acting bronchodilator medication using Bayesian networks. Pharmacogenomics 2009; 10:1393-412. [PMID: 19761364 DOI: 10.2217/pgs.09.93] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIMS Bronchodilator response tests measure the effect of beta(2)-agonists, the most commonly used short-acting reliever drugs for asthma. We sought to relate candidate gene SNP data with bronchodilator response and measure the predictive accuracy of a model constructed with genetic variants. MATERIALS & METHODS Bayesian networks, multivariate models that are able to account for simultaneous associations and interactions among variables, were used to create a predictive model of bronchodilator response using candidate gene SNP data from 308 Childhood Asthma Management Program Caucasian subjects. RESULTS The model found that 15 SNPs in 15 genes predict bronchodilator response with fair accuracy, as established by a fivefold cross-validation area under the receiver-operating characteristic curve of 0.75 (standard error: 0.03). CONCLUSION Bayesian networks are an attractive approach to analyze large-scale pharmacogenetic SNP data because of their ability to automatically learn complex models that can be used for the prediction and discovery of novel biological hypotheses.
Collapse
Affiliation(s)
- Blanca E Himes
- Harvard-MIT Division of Health Sciences and Technology, MA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Qin H, Chan MWY, Liyanarachchi S, Balch C, Potter D, Souriraj IJ, Cheng ASL, Agosto-Perez FJ, Nikonova EV, Yan PS, Lin HJ, Nephew KP, Saltz JH, Showe LC, Huang THM, Davuluri RV. An integrative ChIP-chip and gene expression profiling to model SMAD regulatory modules. BMC SYSTEMS BIOLOGY 2009; 3:73. [PMID: 19615063 PMCID: PMC2724489 DOI: 10.1186/1752-0509-3-73] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 07/17/2009] [Indexed: 12/24/2022]
Abstract
Background The TGF-β/SMAD pathway is part of a broader signaling network in which crosstalk between pathways occurs. While the molecular mechanisms of TGF-β/SMAD signaling pathway have been studied in detail, the global networks downstream of SMAD remain largely unknown. The regulatory effect of SMAD complex likely depends on transcriptional modules, in which the SMAD binding elements and partner transcription factor binding sites (SMAD modules) are present in specific context. Results To address this question and develop a computational model for SMAD modules, we simultaneously performed chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) and mRNA expression profiling to identify TGF-β/SMAD regulated and synchronously coexpressed gene sets in ovarian surface epithelium. Intersecting the ChIP-chip and gene expression data yielded 150 direct targets, of which 141 were grouped into 3 co-expressed gene sets (sustained up-regulated, transient up-regulated and down-regulated), based on their temporal changes in expression after TGF-β activation. We developed a data-mining method driven by the Random Forest algorithm to model SMAD transcriptional modules in the target sequences. The predicted SMAD modules contain SMAD binding element and up to 2 of 7 other transcription factor binding sites (E2F, P53, LEF1, ELK1, COUPTF, PAX4 and DR1). Conclusion Together, the computational results further the understanding of the interactions between SMAD and other transcription factors at specific target promoters, and provide the basis for more targeted experimental verification of the co-regulatory modules.
Collapse
Affiliation(s)
- Huaxia Qin
- Human Cancer Genetics Program, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Identification of differential gene expression for microarray data using recursive random forest. Chin Med J (Engl) 2008. [DOI: 10.1097/00029330-200812020-00005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
38
|
Chang JS, Yeh RF, Wiencke JK, Wiemels JL, Smirnov I, Pico AR, Tihan T, Patoka J, Miike R, Sison JD, Rice T, Wrensch MR. Pathway analysis of single-nucleotide polymorphisms potentially associated with glioblastoma multiforme susceptibility using random forests. Cancer Epidemiol Biomarkers Prev 2008; 17:1368-73. [PMID: 18559551 DOI: 10.1158/1055-9965.epi-07-2830] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Glioma is a complex disease that is unlikely to result from the effect of a single gene. Genetic analysis at the pathway level involving multiple genes may be more likely to capture gene-disease associations than analyzing genes one at a time. The current pilot study included 112 Caucasians with glioblastoma multiforme and 112 Caucasian healthy controls frequency matched to cases by age and gender. Subjects were genotyped using a commercially available (ParAllele/Affymetrix) assay panel of 10,177 nonsynonymous coding single-nucleotide polymorphisms (SNP) spanning the genome known at the time the panel was constructed. For this analysis, we selected 10 pathways potentially involved in gliomagenesis that had SNPs represented on the panel. We performed random forests (RF) analyses of SNPs within each pathway group and logistic regression to assess interaction among genes in the one pathway for which the RF prediction error was better than chance and the permutation P < 0.10. Only the DNA repair pathway had a better than chance classification of case-control status with a prediction error of 45.5% and P = 0.09. Three SNPs (rs1047840 of EXO1, rs12450550 of EME1, and rs799917 of BRCA1) of the DNA repair pathway were identified as promising candidates for further replication. In addition, statistically significant interactions (P < 0.05) between rs1047840 of EXO1 and rs799917 or rs1799966 of BRCA1 were observed. Despite less than complete inclusion of genes and SNPs relevant to glioma and a small sample size, RF analysis identified one important biological pathway and several SNPs potentially associated with the development of glioblastoma.
Collapse
Affiliation(s)
- Jeffrey S Chang
- Department of Epidemiology and Biostatistics, University of California, San Francisco, 44 Page Street, Suite 503, San Francisco, CA 94143-1215, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ. Random forests for classification in ecology. Ecology 2008; 88:2783-92. [PMID: 18051647 DOI: 10.1890/07-0539.1] [Citation(s) in RCA: 1457] [Impact Index Per Article: 85.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.
Collapse
Affiliation(s)
- D Richard Cutler
- Department of Mathematics and Statistics, Utah State University, Logan, Utah 84322-3900, USA.
| | | | | | | | | | | | | |
Collapse
|
40
|
|