1
|
Kim K, Oh SJ, Lee J, Kwon A, Yu CY, Kim S, Choi CH, Kang SB, Kim TO, Park DI, Lee CK. Regulatory Variants on the Leukocyte Immunoglobulin-Like Receptor Gene Cluster are Associated with Crohn's Disease and Interact with Regulatory Variants for TAP2. J Crohns Colitis 2024; 18:47-53. [PMID: 37523193 DOI: 10.1093/ecco-jcc/jjad127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Indexed: 08/01/2023]
Abstract
BACKGROUND AND AIMS Crohn's disease [CD] has a complex polygenic aetiology with high heritability. There is ongoing effort to identify novel variants associated with susceptibility to CD through a genome-wide association study [GWAS] in large Korean populations. METHODS Genome-wide variant data from 902 Korean patients with CD and 72 179 controls were used to assess the genetic associations in a meta-analysis with previous Korean GWAS results from 1621 patients with CD and 4419 controls. Epistatic interactions between CD-risk variants of interest were tested using a multivariate logistic regression model with an interaction term. RESULTS We identified two novel genetic associations with the risk of CD near ZBTB38 and within the leukocyte immunoglobulin-like receptor [LILR] gene cluster [p < 5 × 10-8], with highly consistent effect sizes between the two independent Korean cohorts. CD-risk variants in the LILR locus are known quantitative trait loci [QTL] for multiple LILR genes, of which LILRB2 directly interacts with various ligands including MHC class I molecules. The LILR lead variant exhibited a significant epistatic interaction with CD-associated regulatory variants for TAP2 involved in the antigen presentation of MHC class I molecules [p = 4.11 × 10-4], showing higher CD-risk effects of the TAP2 variant in individuals carrying more risk alleles of the LILR lead variant (odds ratio [OR] = 0.941, p = 0.686 in non-carriers; OR = 1.45, p = 2.51 × 10-4 in single-copy carriers; OR = 2.38, p = 2.76 × 10-6 in two-copy carriers). CONCLUSIONS This study demonstrated that genetic variants at two novel susceptibility loci and the epistatic interaction between variants in LILR and TAP2 loci confer a risk of CD.
Collapse
Affiliation(s)
- Kwangwoo Kim
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea
- Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
| | - Shin Ju Oh
- Department of Gastroenterology, Center for Crohn's and Colitis, Kyung Hee University College of Medicine, Seoul, Republic of Korea
| | - Junho Lee
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea
- Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
| | - Ayeong Kwon
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea
| | - Chae-Yeon Yu
- Department of Biomedical and Pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
| | - Sangsoo Kim
- Department of Bioinformatics, Soongsil University, Seoul, Republic of Korea
| | - Chang Hwan Choi
- Department of Internal Medicine, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - Sang-Bum Kang
- Department of Internal Medicine, College of Medicine, Daejeon St. Mary's Hospital, The Catholic University of Korea, Daejeon, Republic of Korea
| | - Tae Oh Kim
- Department of Internal Medicine, Haeundae Paik Hospital, Inje University College of Medicine, Busan, Republic of Korea
| | - Dong Il Park
- Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Chang Kyun Lee
- Department of Gastroenterology, Center for Crohn's and Colitis, Kyung Hee University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
2
|
Wang X, Qiao Y, Cui Y, Ren H, Zhao Y, Linghu L, Ren J, Zhao Z, Chen L, Qiu L. An explainable artificial intelligence framework for risk prediction of COPD in smokers. BMC Public Health 2023; 23:2164. [PMID: 37932692 PMCID: PMC10626705 DOI: 10.1186/s12889-023-17011-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 10/17/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions. METHODS The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP). RESULTS In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population. CONCLUSION This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.
Collapse
Affiliation(s)
- Xuchun Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Yuchao Qiao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Yu Cui
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Hao Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Ying Zhao
- Shanxi Centre for Disease Control and Prevention, Taiyuan, Shanxi, 030012, China
| | - Liqin Linghu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
- Shanxi Centre for Disease Control and Prevention, Taiyuan, Shanxi, 030012, China
| | - Jiahui Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Zhiyang Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Limin Chen
- The Fifth Hospital (Shanxi People's Hospital) of Shanxi Medical University, Taiyuan, Shanxi, 030012, P.R. China.
| | - Lixia Qiu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China.
| |
Collapse
|
3
|
Morley TJ, Willimitis D, Ripperger M, Lee H, Han L, Zhou Y, Kang J, Davis LK, Smoller JW, Choi KW, Walsh CG, Ruderfer DM. Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models. medRxiv 2023:2023.11.01.23297927. [PMID: 37961557 PMCID: PMC10635256 DOI: 10.1101/2023.11.01.23297927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn's disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models.
Collapse
Affiliation(s)
- Theodore J. Morley
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Drew Willimitis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Michael Ripperger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Hyunjoon Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Lide Han
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
| | - Yu Zhou
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Jooeun Kang
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Jordan W. Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Karmel W. Choi
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Colin G. Walsh
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Douglas M. Ruderfer
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
4
|
Park SK, Lee GY, Kim S, Lee CW, Choi CH, Kang SB, Kim TO, Chun J, Cha JM, Im JP, Ahn KS, Kim SY, Kim MS, Lee CK, Park DI. Enrichment of Activated Fibroblasts as a Potential Biomarker for a Non-Durable Response to Anti-Tumor Necrosis Factor Therapy in Patients with Crohn's Disease. Int J Mol Sci 2023; 24:14799. [PMID: 37834250 PMCID: PMC10573580 DOI: 10.3390/ijms241914799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 09/19/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023] Open
Abstract
We investigated whether the response to anti-tumor necrosis factor (anti-TNF) treatment varied according to inflammatory tissue characteristics in Crohn's disease (CD). Bulk RNA sequencing (RNA-seq) data were obtained from inflamed and non-inflamed tissues from 170 patients with CD. The samples were clustered based on gene expression profiles using principal coordinate analysis (PCA). Cellular heterogeneity was inferred using CiberSortx, with bulk RNA-seq data. The PCA results displayed two clusters of CD-inflamed samples: one close to (Inflamed_1) and the other far away (Inflamed_2) from the non-inflamed samples. Inflamed_1 was rich in anti-TNF durable responders (DRs), and Inflamed_2 was enriched in non-durable responders (NDRs). The CiberSortx results showed that the cell fraction of activated fibroblasts was six times higher in Inflamed_2 than in Inflamed_1. Validation with public gene expression datasets (GSE16879) revealed that the activated fibroblasts were enriched in NDRs over Next, we used DRs by 1.9 times pre-treatment and 7.5 times after treatment. Fibroblast activation protein (FAP) was overexpressed in the Inflamed_2 and was also overexpressed in the NDRs in both the RISK and GSE16879 datasets. The activation of fibroblasts may play a role in resistance to anti-TNF therapy. Characterizing fibroblasts in inflamed tissues at diagnosis may help to identify patients who are likely to respond to anti-TNF therapy.
Collapse
Affiliation(s)
- Soo-Kyung Park
- Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, Seoul 03181, Republic of Korea;
- Medical Research Institute, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, Seoul 03181, Republic of Korea;
| | - Gi-Young Lee
- Department of Bioinformatics, Soongsil University, Seoul 06978, Republic of Korea; (G.-Y.L.); (S.K.)
| | - Sangsoo Kim
- Department of Bioinformatics, Soongsil University, Seoul 06978, Republic of Korea; (G.-Y.L.); (S.K.)
| | - Chil-Woo Lee
- Medical Research Institute, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, Seoul 03181, Republic of Korea;
| | - Chang-Hwan Choi
- Department of Internal Medicine, College of Medicine, Chung-Ang University, Seoul 06974, Republic of Korea;
| | - Sang-Bum Kang
- Department of Internal Medicine, Daejeon St. Mary’s Hospital, Daejeon 34943, Republic of Korea;
| | - Tae-Oh Kim
- Department of Internal Medicine, Haeundae Paik Hospital, College of Medicine, Inje University, Busan 47392, Republic of Korea;
| | - Jaeyoung Chun
- Department of Internal Medicine, Gangnam Severance Hospital, College of Medicine, Yonsei University, Seoul 03722, Republic of Korea;
| | - Jae-Myung Cha
- Department of Internal Medicine, Kyung Hee University Hospital at Gang Dong, College of Medicine, Kyung Hee University, Seoul 02447, Republic of Korea;
| | - Jong-Pil Im
- Department of Internal Medicine, Liver Research Institute, College of Medicine, Seoul National University, Seoul 08826, Republic of Korea;
| | - Kwang-Sung Ahn
- Functional Genome Institute, PDXen Biosystems, Inc., Daejeon 34027, Republic of Korea;
| | - Seon-Young Kim
- Personalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea;
| | - Min-Suk Kim
- Department of Human Intelligence and Robot Engineering, Sangmyung University, Cheonan 31066, Republic of Korea;
| | - Chang-Kyun Lee
- Department of Gastroenterology, Center for Crohn’s and Colitis, Kyung Hee University Hospital, College of Medicine, Kyung Hee University, Seoul 02447, Republic of Korea;
| | - Dong-Il Park
- Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, Seoul 03181, Republic of Korea;
- Medical Research Institute, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, Seoul 03181, Republic of Korea;
| |
Collapse
|
5
|
Witvoet S, de Massari D, Shi S, Chen AF. Leveraging large, real-world data through machine-learning to increase efficiency in robotic-assisted total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc 2023:10.1007/s00167-023-07314-1. [PMID: 36650339 DOI: 10.1007/s00167-023-07314-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 01/04/2023] [Indexed: 01/19/2023]
Abstract
PURPOSE Increased operative time can be due to patient, surgeon and surgical factors, and may be predicted by machine learning (ML) modeling to potentially improve staff utilization and operating room efficiency. The purposes of our study were to: (1) determine how demographic, surgeon, and surgical factors affected operative times, and (2) train a ML model to estimate operative time for robotic-assisted primary total knee arthroplasty (TKA). METHODS A retrospective study from 2007 to 2020 was conducted including 300,000 unilateral primary TKA cases. Demographic and surgical variables were evaluated using Wilcoxon/Kruskal-Wallis tests to determine significant factors of operative time as predictors in the ML models. For the ML analysis of robotic-assisted TKAs (> 18,000), two algorithms were used to learn the relationship between selected predictors and operative time. Predictive model performance was subsequently assessed on a test data set comparing predicted and actual operative time. Root mean square error (RMSE), R2 and percentage of predictions with an error < 5/10/15 min were computed. RESULTS Males, BMI > 40 kg/m2 and cemented implants were associated with increased operative time, while age > 65yo, cementless, and high surgeon case volume had reduced operative time. Robotic-assisted TKA increased operative time for low-volume surgeons and decreased operative time for high-volume surgeons. Both ML models provided more accurate operative time predictions than standard time estimates based on surgeon historical averages. CONCLUSIONS This study demonstrated that greater surgeon case volume, cementless fixation, manual TKA, female, older and non-obese patients reduced operative time. ML prediction of operative time can be more accurate than historical averages, which may lead to optimized operating room utilization. LEVEL OF EVIDENCE III.
Collapse
Affiliation(s)
| | | | - Sarah Shi
- Stryker Corporation, Mahwah, NJ, USA
| | - Antonia F Chen
- Department of Orthopaedics, Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, MA, 02115, USA.
| |
Collapse
|
6
|
Lee KS, Kim ES. Explainable Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease. Diagnostics (Basel) 2022; 12:2740. [PMID: 36359583 PMCID: PMC9689865 DOI: 10.3390/diagnostics12112740] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/03/2022] [Accepted: 11/06/2022] [Indexed: 08/29/2023] Open
Abstract
This study reviews the recent progress of explainable artificial intelligence for the early diagnosis of gastrointestinal disease (GID). The source of data was eight original studies in PubMed. The search terms were "gastrointestinal" (title) together with "random forest" or "explainable artificial intelligence" (abstract). The eligibility criteria were the dependent variable of GID or a strongly associated disease, the intervention(s) of artificial intelligence, the outcome(s) of accuracy and/or the area under the receiver operating characteristic curve (AUC), the outcome(s) of variable importance and/or the Shapley additive explanations (SHAP), a publication year of 2020 or later, and the publication language of English. The ranges of performance measures were reported to be 0.70-0.98 for accuracy, 0.04-0.25 for sensitivity, and 0.54-0.94 for the AUC. The following factors were discovered to be top-10 predictors of gastrointestinal bleeding in the intensive care unit: mean arterial pressure (max), bicarbonate (min), creatinine (max), PMN, heart rate (mean), Glasgow Coma Scale, age, respiratory rate (mean), prothrombin time (max) and aminotransferase aspartate (max). In a similar vein, the following variables were found to be top-10 predictors for the intake of almond, avocado, broccoli, walnut, whole-grain barley, and/or whole-grain oat: Roseburia undefined, Lachnospira spp., Oscillibacter undefined, Subdoligranulum spp., Streptococcus salivarius subsp. thermophiles, Parabacteroides distasonis, Roseburia spp., Anaerostipes spp., Lachnospiraceae ND3007 group undefined, and Ruminiclostridium spp. Explainable artificial intelligence provides an effective, non-invasive decision support system for the early diagnosis of GID.
Collapse
Affiliation(s)
- Kwang-Sig Lee
- AI Center, Korea University Anam Hospital, Seoul 02841, Korea
| | - Eun Sun Kim
- Department of Gastroenterology, Korea University Anam Hospital, Seoul 02841, Korea
| |
Collapse
|
7
|
Stafford IS, Gosink MM, Mossotto E, Ennis S, Hauben M. A Systematic Review of Artificial Intelligence and Machine Learning Applications to Inflammatory Bowel Disease, with Practical Guidelines for Interpretation. Inflamm Bowel Dis 2022; 28:1573-1583. [PMID: 35699597 PMCID: PMC9527612 DOI: 10.1093/ibd/izac115] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Indexed: 12/15/2022]
Abstract
BACKGROUND Inflammatory bowel disease (IBD) is a gastrointestinal chronic disease with an unpredictable disease course. Computational methods such as machine learning (ML) have the potential to stratify IBD patients for the provision of individualized care. The use of ML methods for IBD was surveyed, with an additional focus on how the field has changed over time. METHODS On May 6, 2021, a systematic review was conducted through a search of MEDLINE and Embase databases, with the search structure ("machine learning" OR "artificial intelligence") AND ("Crohn* Disease" OR "Ulcerative Colitis" OR "Inflammatory Bowel Disease"). Exclusion criteria included studies not written in English, no human patient data, publication before 2001, studies that were not peer reviewed, nonautoimmune disease comorbidity research, and record types that were not primary research. RESULTS Seventy-eight (of 409) records met the inclusion criteria. Random forest methods were most prevalent, and there was an increase in neural networks, mainly applied to imaging data sets. The main applications of ML to clinical tasks were diagnosis (18 of 78), disease course (22 of 78), and disease severity (16 of 78). The median sample size was 263. Clinical and microbiome-related data sets were most popular. Five percent of studies used an external data set after training and testing for additional model validation. DISCUSSION Availability of longitudinal and deep phenotyping data could lead to better modeling. Machine learning pipelines that consider imbalanced data and that feature selection only on training data will generate more generalizable models. Machine learning models are increasingly being applied to more complex clinical tasks for specific phenotypes, indicating progress towards personalized medicine for IBD.
Collapse
Affiliation(s)
| | | | - Enrico Mossotto
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Sarah Ennis
- Address correspondence to: Sarah Ennis, Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK ()
| | | |
Collapse
|
8
|
Park SK, Kim S, Lee GY, Kim SY, Kim W, Lee CW, Park JL, Choi CH, Kang SB, Kim TO, Bang KB, Chun J, Cha JM, Im JP, Ahn KS, Kim SY, Park DI. Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn's Disease Using RNA Sequencing Data. Diagnostics (Basel) 2021; 11:diagnostics11122365. [PMID: 34943601 PMCID: PMC8700628 DOI: 10.3390/diagnostics11122365] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/01/2021] [Accepted: 12/10/2021] [Indexed: 12/13/2022] Open
Abstract
Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (n = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.
Collapse
Affiliation(s)
- Soo-Kyung Park
- Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea;
- Medical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea;
| | - Sangsoo Kim
- Department of Bioinformatics, Soongsil University, Seoul 06978, Korea; (S.K.); (G.-Y.L.); (S.-Y.K.); (W.K.)
| | - Gi-Young Lee
- Department of Bioinformatics, Soongsil University, Seoul 06978, Korea; (S.K.); (G.-Y.L.); (S.-Y.K.); (W.K.)
| | - Sung-Yoon Kim
- Department of Bioinformatics, Soongsil University, Seoul 06978, Korea; (S.K.); (G.-Y.L.); (S.-Y.K.); (W.K.)
| | - Wan Kim
- Department of Bioinformatics, Soongsil University, Seoul 06978, Korea; (S.K.); (G.-Y.L.); (S.-Y.K.); (W.K.)
| | - Chil-Woo Lee
- Medical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea;
| | - Jong-Lyul Park
- Personalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea;
| | - Chang-Hwan Choi
- Department of Internal Medicine, College of Medicine, Chung-Ang University, Seoul 04388, Korea;
| | - Sang-Bum Kang
- Department of Internal Medicine, College of Medicine, Daejeon St. Mary’s Hospital, The Catholic University of Korea, Daejeon 34943, Korea;
| | - Tae-Oh Kim
- Department of Internal Medicine, Haeundae Paik Hospital, Inje University College of Medicine, Busan 48108, Korea;
| | - Ki-Bae Bang
- Department of Internal Medicine, Dankook University College of Medicine, Cheonan 31116, Korea;
| | - Jaeyoung Chun
- Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Korea;
| | - Jae-Myung Cha
- Department of Internal Medicine, Kyung Hee University Hospital at Gang Dong, Kyung Hee University College of Medicine, Seoul 05278, Korea;
| | - Jong-Pil Im
- Department of Internal Medicine and Liver Research Institute, College of Medicine, Seoul National University, Seoul 03080, Korea;
| | - Kwang-Sung Ahn
- Functional Genome Institute, PDXen Biosystems Inc., Daejeon 34129, Korea;
| | - Seon-Young Kim
- Personalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea;
- Correspondence: (S.-Y.K.); (D.-I.P.); Tel.: +82-42-879-8116 (S.-Y.K.); Tel.: +82-2-2001-8555 (D.-I.P.); Fax: +82-42-879-8119 (S.-Y.K.); Fax: +82-2-2001-8360 (D.-I.P.)
| | - Dong-Il Park
- Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea;
- Medical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea;
- Correspondence: (S.-Y.K.); (D.-I.P.); Tel.: +82-42-879-8116 (S.-Y.K.); Tel.: +82-2-2001-8555 (D.-I.P.); Fax: +82-42-879-8119 (S.-Y.K.); Fax: +82-2-2001-8360 (D.-I.P.)
| |
Collapse
|
9
|
Chung CW, Hsiao TH, Huang CJ, Chen YJ, Chen HH, Lin CH, Chou SC, Chen TS, Chung YF, Yang HI, Chen YM. Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus. BioData Min 2021; 14:52. [PMID: 34895289 PMCID: PMC8666017 DOI: 10.1186/s13040-021-00284-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 11/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study's purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation.
Collapse
Affiliation(s)
- Chih-Wei Chung
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chih-Jen Huang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yen-Ju Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Hsin-Hua Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Seng-Cho Chou
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzer-Shyong Chen
- Department of Information Management, Tunghai University, Taichung, Taiwan
| | - Yu-Fang Chung
- Department of Electrical Engineering, Tunghai University, Taichung, Taiwan
| | - Hwai-I Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yi-Ming Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan.
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan.
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan.
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- College of Medicine, National Chung Hsing University, 40227, Taichung City, Taiwan.
| |
Collapse
|
10
|
Liu LP, Lu L, Zhao QQ, Kou QJ, Jiang ZZ, Gui R, Luo YW, Zhao QY. Identification and Validation of the Pyroptosis-Related Molecular Subtypes of Lung Adenocarcinoma by Bioinformatics and Machine Learning. Front Cell Dev Biol 2021; 9:756340. [PMID: 34805165 PMCID: PMC8599430 DOI: 10.3389/fcell.2021.756340] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/04/2021] [Indexed: 12/20/2022] Open
Abstract
Lung cancer remains the leading cause of cancer death globally, with lung adenocarcinoma (LUAD) being its most prevalent subtype. Due to the heterogeneity of LUAD, patients given the same treatment regimen may have different responses and clinical outcomes. Therefore, identifying new subtypes of LUAD is important for predicting prognosis and providing personalized treatment for patients. Pyroptosis-related genes play an essential role in anticancer, but there is limited research investigating pyroptosis in LUAD. In this study, 33 pyroptosis gene expression profiles and clinical information were collected from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. By bioinformatics and machine learning analyses, we identified novel subtypes of LUAD based on 10 pyroptosis-related genes and further validated them in the GEO dataset, with machine learning models performing up to an AUC of 1 for classifying in GEO. A web-based tool was established for clinicians to use our clustering model (http://www.aimedicallab.com/tool/aiml-subphe-luad.html). LUAD patients were clustered into 3 subtypes (A, B, and C), and survival analysis showed that B had the best survival outcome and C had the worst survival outcome. The relationships between pyroptosis gene expression and clinical characteristics were further analyzed in the three molecular subtypes. Immune profiling revealed significant differences in immune cell infiltration among the three molecular subtypes. GO enrichment and KEGG pathway analyses were performed based on the differential genes of the three subtypes, indicating that differentially expressed genes (DEGs) were involved in multiple cellular and biological functions, including RNA catabolic process, mRNA catabolic process, and pathways of neurodegeneration-multiple diseases. Finally, we developed an 8-gene prognostic model that accurately predicted 1-, 3-, and 5-year overall survival. In conclusion, pyroptosis-related genes may play a critical role in LUAD, and provide new insights into the underlying mechanisms of LUAD.
Collapse
Affiliation(s)
- Le-Ping Liu
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Lu Lu
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Qiang-Qiang Zhao
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Qin-Jie Kou
- Department of Laboratory Medicine, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Zhen-Zhen Jiang
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Rong Gui
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Yan-Wei Luo
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Qin-Yu Zhao
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China.,College of Engineering and Computer Science, The Australian National University, Canberra, ACT, Australia
| |
Collapse
|
11
|
Derkacz A, Olczyk P, Olczyk K, Komosinska-Vassev K. The Role of Extracellular Matrix Components in Inflammatory Bowel Diseases. J Clin Med 2021; 10:jcm10051122. [PMID: 33800267 PMCID: PMC7962650 DOI: 10.3390/jcm10051122] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 02/07/2023] Open
Abstract
The remodeling of extracellular matrix (ECM) within the intestine tissues, which simultaneously involves an increased degradation of ECM components and excessive intestinal fibrosis, is a defining trait of the progression of inflammatory bowel diseases (IBDs), which include ulcerative colitis (UC) and Crohn's disease (CD). The increased activity of proteases, especially matrix metalloproteinases (MMPs), leads to excessive degradation of the extracellular matrix and the release of protein and glycoprotein fragments, previously joined with the extracellular matrix, into the circulation. MMPs participate in regulating the functions of the epithelial barrier, the immunological response, and the process of wound healing or intestinal fibrosis. At a later stage of fibrosis during IBD, excessive formation and deposition of the matrix is observed. To assess changes in the extracellular matrix, quantitative measurement of the concentration in the blood of markers dependent on the activity of proteases, involved in the breakdown of extracellular matrix proteins as well as markers indicating the formation of a new ECM, has recently been proposed. This paper describes attempts to use the quantification of ECM components as markers to predict intestinal fibrosis and evaluate the healing process of the gut. The markers which reflect increased ECM degradation, together with the ones which show the process of creating a new matrix during IBD, allow the attainment of important information regarding the changes in the intestinal tissue, epithelial integrity and extracellular matrix remodeling. This paper contains evidence confirming that ECM remodeling is an integral part of directional cell signaling in the progression of IBD, and not only a basis for the ongoing processes.
Collapse
Affiliation(s)
- Alicja Derkacz
- Department of Clinical Chemistry and Laboratory Diagnostics, Faculty of Pharmaceutical Sciences in Sosnowiec, Medical University of Silesia in Katowice, 41-200 Sosnowiec, Poland; (A.D.); (K.O.)
| | - Paweł Olczyk
- Department of Community Pharmacy, Faculty of Pharmaceutical Sciences in Sosnowiec, Medical University of Silesia in Katowice, 41-200 Sosnowiec, Poland;
| | - Krystyna Olczyk
- Department of Clinical Chemistry and Laboratory Diagnostics, Faculty of Pharmaceutical Sciences in Sosnowiec, Medical University of Silesia in Katowice, 41-200 Sosnowiec, Poland; (A.D.); (K.O.)
| | - Katarzyna Komosinska-Vassev
- Department of Clinical Chemistry and Laboratory Diagnostics, Faculty of Pharmaceutical Sciences in Sosnowiec, Medical University of Silesia in Katowice, 41-200 Sosnowiec, Poland; (A.D.); (K.O.)
- Correspondence: ; Tel.: +48-32364-1150
| |
Collapse
|