1
|
Qiu J, Hu Y, Li L, Erzurumluoglu AM, Braenne I, Whitehurst C, Schmitz J, Arora J, Bartholdy BA, Gandhi S, Khoueiry P, Mueller S, Noyvert B, Ding Z, Jensen JN, de Jong J. Deep representation learning for clustering longitudinal survival data from electronic health records. Nat Commun 2025; 16:2534. [PMID: 40087274 PMCID: PMC11909183 DOI: 10.1038/s41467-025-56625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 01/21/2025] [Indexed: 03/17/2025] Open
Abstract
Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn's disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn's disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
Collapse
Affiliation(s)
- Jiajun Qiu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Yao Hu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Li Li
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Abdullah Mesut Erzurumluoglu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Ingrid Braenne
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Charles Whitehurst
- Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA
| | - Jochen Schmitz
- Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA
| | - Jatin Arora
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Boris Alexander Bartholdy
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Shrey Gandhi
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Pierre Khoueiry
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Stefanie Mueller
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Boris Noyvert
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Zhihao Ding
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Jan Nygaard Jensen
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Johann de Jong
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
| |
Collapse
|
2
|
Rodríguez-Fuertes A, Alard-Josemaría J, Sandubete JE. Measuring the Candidates' Emotions in Political Debates Based on Facial Expression Recognition Techniques. Front Psychol 2022; 13:785453. [PMID: 35615169 PMCID: PMC9126085 DOI: 10.3389/fpsyg.2022.785453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 04/14/2022] [Indexed: 11/13/2022] Open
Abstract
This article presents the analysis of the main Spanish political candidates for the elections to be held on April 2019. The analysis focuses on the Facial Expression Analysis (FEA), a technique widely used in neuromarketing research. It allows to identify the micro-expressions that are very brief, involuntary. They are signals of hidden emotions that cannot be controlled voluntarily. The video with the final interventions of every candidate has been post-processed using the classification algorithms given by the iMotions's AFFDEX platform. We have then analyzed these data. Firstly, we have identified and compare the basic emotions showed by each politician. Second, we have associated the basic emotions with specific moments of the candidate's speech, identifying the topics they address and relating them directly to the expressed emotion. Third, we have analyzed whether the differences shown by each candidate in every emotion are statistically significant. In this sense, we have applied the non-parametric chi-squared goodness-of-fit test. We have also considered the ANOVA analysis in order to test whether, on average, there are differences between the candidates. Finally, we have checked if there is consistency between the results provided by different surveys from the main media in Spain regarding the evaluation of the debate and those obtained in our empirical analysis. A predominance of negative emotions has been observed. Some inconsistencies were found between the emotion expressed in the facial expression and the verbal content of the message. Also, evidences got from statistical analysis confirm that the differences observed between the various candidates with respect to the basic emotions, on average, are statistically significant. In this sense, this article provides a methodological contribution to the analysis of the public figures' communication, which could help politicians to improve the effectiveness of their messages identifying and evaluating the intensity of the expressed emotions.
Collapse
Affiliation(s)
| | | | - Julio E Sandubete
- Complutense University of Madrid, Madrid, Spain.,CEU San Pablo University, Madrid, Spain
| |
Collapse
|
3
|
Fernández-Gutiérrez F, Kennedy JI, Cooksey R, Atkinson M, Choy E, Brophy S, Huo L, Zhou SM. Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework. Diagnostics (Basel) 2021; 11:diagnostics11101908. [PMID: 34679609 PMCID: PMC8534858 DOI: 10.3390/diagnostics11101908] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/06/2021] [Accepted: 10/13/2021] [Indexed: 11/16/2022] Open
Abstract
(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process.
Collapse
Affiliation(s)
- Fabiola Fernández-Gutiérrez
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; (F.F.-G.); (J.I.K.); (R.C.); (M.A.); (S.B.)
| | - Jonathan I. Kennedy
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; (F.F.-G.); (J.I.K.); (R.C.); (M.A.); (S.B.)
| | - Roxanne Cooksey
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; (F.F.-G.); (J.I.K.); (R.C.); (M.A.); (S.B.)
| | - Mark Atkinson
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; (F.F.-G.); (J.I.K.); (R.C.); (M.A.); (S.B.)
| | - Ernest Choy
- Arthritis Research UK CREATE Centre, Division Infection and Immunity, Cardiff University, Cardiff CF10 3NB, UK;
- Welsh Arthritis Research Network, School of Medicine, Cardiff University, Cardiff CF10 3NB, UK
| | - Sinead Brophy
- Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK; (F.F.-G.); (J.I.K.); (R.C.); (M.A.); (S.B.)
| | - Lin Huo
- China-ASEAN Research Institute, Guangxi University, Nanning 530004, China;
| | - Shang-Ming Zhou
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth PL4 8AA, UK
- Correspondence: or
| |
Collapse
|