1
|
Qiu J, Zhu T, Qin K, Zhang W. The interaction network and potential clinical effectiveness of dimensional psychopathology phenotyping based on EMR: a Bayesian network approach. BMC Psychiatry 2025; 25:81. [PMID: 39875818 PMCID: PMC11776203 DOI: 10.1186/s12888-025-06510-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 01/16/2025] [Indexed: 01/30/2025] Open
Abstract
The current DSM-oriented diagnostic paradigm has introduced the issue of heterogeneity, as it fails to account for the identification of the neurological processes underlying mental illnesses, which affects the precision of treatment. The Research Domain Criteria (RDoC) framework serves as a recognized approach to addressing this heterogeneity, and several assessment and translation techniques have been proposed. Among these methods, transforming RDoC scores from electronic medical records (EMR) using Natural Language Processing (NLP) has emerged as a suitable technique, demonstrating clinical effectiveness. Numerous studies have sought to use RDoC to understand the Diagnostic and Statistical Manual of Mental Disorders (DSM) categories from a qualified perspective, but few studies have examined the distribution variations and interaction characteristics of RDoC within various DSM categories through retrospective analyses. Therefore, we employed unsupervised learning to translate five domains of eRDoC scores derived from electronic medical records (EMR) of patients diagnosed with Major Depressive Disorder (MDD), Schizophrenia (SCZ), and Bipolar Disorder (BD) at West China Hospital between 2008 and 2021. The distribution characteristics, interaction networks, and potential clinical effectiveness of RDoC domains were analyzed. Using non-parametric statistical tests, we found that MDD had the highest score in Negative Valence System (NVS) (4.1, p < 0.001), while BD exhibited the highest score in Positive Valence System (PVS) score (4.9, p < 0.001) and Arousal System (AS) (4.4, p < 0.001). SCZ demonstrated the highest scores in Cognitive Systems (CS) (5.8, p < 0.001) and Social Processes Systems (SPS) (4.6, p < 0.001). Through Bayesian network (BN) analysis, we identified relatively consistent interaction relationships among various RDoC domains (NVS → AS, NVS → CS, NVS → PVS, as well as CS → SPS; parameter range = 0.156 to 0.635, p < 0.001). Lastly, using logistic regression and Cox proportional hazards models, we demonstrated that AS was significantly associated with the length of hospital stay (-0.21, p < 0.05) and 30-day readmission risk (adjusted odds ratio [aOR] = 0.91, 95% confidence interval [CI] 0.91-0.99) to some extent. In conclusion, we suggest that the eRDoC characteristics varied in different DSM. By Bayesian Network, we found NVS and CS might be potential source in interacting with other system. Furthermore, CS, SPS and AS were associated with the length of stay and 30-days readmission, making them effective for predicting prognosis of psychiatric disorders.
Collapse
Affiliation(s)
- Jianqing Qiu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
| | - Ting Zhu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Medical Big Data Center, Sichuan University, Chengdu, China
| | - Ke Qin
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.
| | - Wei Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
- Medical Big Data Center, Sichuan University, Chengdu, China.
- Mental Health Center and Psychiatric Laboratory, The State Key Laboratory of Biotherapy, West China Hospital of Sichuan University, Chengdu, China.
- Huaxi Brain Research Center, West China Hospital of Sichuan University, Chengdu, China.
| |
Collapse
|
2
|
Abstract
PURPOSE OF REVIEW Healthcare has already been impacted by the fourth industrial revolution exemplified by tip of spear technology, such as artificial intelligence and quantum computing. Yet, there is much to be accomplished as systems remain suboptimal, and full interoperability of digital records is not realized. Given the footprint of technology in healthcare, the field of clinical immunology will certainly see improvements related to these tools. RECENT FINDINGS Biomedical informatics spans the gamut of technology in biomedicine. Within this distinct field, advances are being made, which allow for engineering of systems to automate disease detection, create computable phenotypes and improve record portability. Within clinical immunology, technologies are emerging along these lines and are expected to continue. SUMMARY This review highlights advancements in digital health including learning health systems, electronic phenotyping, artificial intelligence and use of registries. Technological advancements for improving diagnosis and care of patients with primary immunodeficiency diseases is also highlighted.
Collapse
|
3
|
Patra BG, Kar R, Roberts K, Wu H. Mental Health Severity Detection from Psychological Forum Data using Domain-Specific Unlabelled Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:487-496. [PMID: 32477670 PMCID: PMC7233051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Mental health has become a growing concern in the medical field, yet remains difficult to study due to both privacy concerns and the lack of objectively quantifiable measurements (e.g., lab tests, physical exams). Instead, the data that is available for mental health is largely based on subjective accounts of a patient's experience, and thus typically is expressed exclusively in text. An important source of such data comes from online sources and directly from the patient, including many forms of social media. In this work, we utilize the datasets provided by the CLPsych shared tasks in 2016 and 2017, derived from online forum posts of ReachOut which have been manually classified according to mental health severity. We implemented an automated severity labeling system using different machine and deep learning algorithms. Our approach combines both supervised and semi-supervised embedding methods using corpus from ReachOut (both labeled and unlabelled) and WebMD (unlabelled). Metadata, syntactic, semantic, and embedding features were used to classify the posts into four categories (green, amber, red, and crisis). The developed systems outperformed other state-of-the-art systems developed on the ReachOut dataset and obtained the maximum micro- averaged F-scores of 0.86 and 0.80 for CLPsych 2016 and 2017 test datasets, respectively, using the above features.
Collapse
Affiliation(s)
- Braja Gopal Patra
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - Reshma Kar
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, India
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| |
Collapse
|
4
|
SECNLP: A survey of embeddings in clinical natural language processing. J Biomed Inform 2020; 101:103323. [DOI: 10.1016/j.jbi.2019.103323] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 09/12/2019] [Accepted: 10/27/2019] [Indexed: 12/11/2022]
|
5
|
Lee W, Choi J. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition. BMC Med Inform Decis Mak 2019; 19:132. [PMID: 31307440 PMCID: PMC6632205 DOI: 10.1186/s12911-019-0865-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 07/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. METHODS Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model's structure allows the precursor entity information to propagate forward through the label sequence. RESULTS We compared the proposed model with both first- and second-order CRFs in terms of their F1-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. CONCLUSION The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F1 score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.
Collapse
Affiliation(s)
- Wangjin Lee
- Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea
| | - Jinwook Choi
- Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. .,Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. .,Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, 101 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea.
| |
Collapse
|
6
|
Zhang Y, Zhang OR, Li R, Flores A, Selek S, Zhang XY, Xu H. Psychiatric stressor recognition from clinical notes to reveal association with suicide. Health Informatics J 2018; 25:1846-1862. [PMID: 30328378 DOI: 10.1177/1460458218796598] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Suicide takes the lives of nearly a million people each year and it is a tremendous economic burden globally. One important type of suicide risk factor is psychiatric stress. Prior studies mainly use survey data to investigate the association between suicide and stressors. Very few studies have investigated stressor data in electronic health records, mostly due to the data being recorded in narrative text. This study takes the initiative to automatically extract and classify psychiatric stressors from clinical text using natural language processing-based methods. Suicidal behaviors were also identified by keywords. Then, a statistical association analysis between suicide ideations/attempts and stressors extracted from a clinical corpus is conducted. Experimental results show that our natural language processing method could recognize stressor entities with an F-measure of 89.01 percent. Mentions of suicidal behaviors were identified with an F-measure of 97.3 percent. The top three significant stressors associated with suicide are health, pressure, and death, which are similar to previous studies. This study demonstrates the feasibility of using natural language processing approaches to unlock information from psychiatric notes in electronic health record, to facilitate large-scale studies about associations between suicide and psychiatric stressors.
Collapse
Affiliation(s)
- Yaoyun Zhang
- The University of Texas Health Science Center at Houston, USA
| | | | - Rui Li
- The University of Texas Health Science Center at Houston, USA
| | | | | | | | - Hua Xu
- The University of Texas Health Science Center at Houston, USA
| |
Collapse
|
7
|
Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts. Comput Biol Med 2018; 101:7-14. [DOI: 10.1016/j.compbiomed.2018.07.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 07/27/2018] [Accepted: 07/31/2018] [Indexed: 11/30/2022]
|
8
|
Du J, Zhang Y, Luo J, Jia Y, Wei Q, Tao C, Xu H. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak 2018; 18:43. [PMID: 30066665 PMCID: PMC6069295 DOI: 10.1186/s12911-018-0632-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Suicide has been one of the leading causes of deaths in the United States. One major cause of suicide is psychiatric stressors. The detection of psychiatric stressors in an at risk population will facilitate the early prevention of suicidal behaviors and suicide. In recent years, the widespread popularity and real-time information sharing flow of social media allow potential early intervention in a large-scale population. However, few automated approaches have been proposed to extract psychiatric stressors from Twitter. The goal of this study was to investigate techniques for recognizing suicide related psychiatric stressors from Twitter using deep learning based methods and transfer learning strategy which leverages an existing annotation dataset from clinical text. METHODS First, a dataset of suicide-related tweets was collected from Twitter streaming data with a multiple-step pipeline including keyword-based retrieving, filtering and further refining using an automated binary classifier. Specifically, a convolutional neural networks (CNN) based algorithm was used to build the binary classifier. Next, psychiatric stressors were annotated in the suicide-related tweets. The stressor recognition problem is conceptualized as a typical named entity recognition (NER) task and tackled using recurrent neural networks (RNN) based methods. Moreover, to reduce the annotation cost and improve the performance, transfer learning strategy was adopted by leveraging existing annotation from clinical text. RESULTS & CONCLUSIONS To our best knowledge, this is the first effort to extract psychiatric stressors from Twitter data using deep learning based approaches. Comparison to traditional machine learning algorithms shows the superiority of deep learning based approaches. CNN is leading the performance at identifying suicide-related tweets with a precision of 78% and an F-1 measure of 83%, outperforming Support Vector Machine (SVM), Extra Trees (ET), etc. RNN based psychiatric stressors recognition obtains the best F-1 measure of 53.25% by exact match and 67.94% by inexact match, outperforming Conditional Random Fields (CRF). Moreover, transfer learning from clinical notes for the Twitter corpus outperforms the training with Twitter corpus only with an F-1 measure of 54.9% by exact match. The results indicate the advantages of deep learning based methods for the automated stressors recognition from social media.
Collapse
Affiliation(s)
- Jingcheng Du
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Yaoyun Zhang
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Jianhong Luo
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
- Department of Management Science and Engineering, Zhejiang Sci-Tech University, Hangzhou, 310018 China
| | - Yuxi Jia
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, 130021 Jilin China
| | - Qiang Wei
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Cui Tao
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Hua Xu
- The University of Texas School of Biomedical Informatics, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| |
Collapse
|
9
|
Zhang Y, Li HJ, Wang J, Cohen T, Roberts K, Xu H. Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:281-289. [PMID: 29888086 PMCID: PMC5961810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Mental health is increasingly recognized an important topic in healthcare. Information concerning psychiatric symptoms is critical for the timely diagnosis of mental disorders, as well as for the personalization of interventions. However, the diversity and sparsity of psychiatric symptoms make it challenging for conventional natural language processing techniques to automatically extract such information from clinical text. To address this problem, this study takes the initiative to use and adapt word embeddings from four source domains - intensive care, biomedical literature, Wikipedia and Psychiatric Forum - to recognize symptoms in the target domain of psychiatry. We investigated four different approaches including 1) only using word embeddings of the source domain, 2) directly combining data of the source and target to generate word embeddings, 3) assigning different weights to word embeddings, and 4) retraining the word embedding model of the source domain using a corpus of the target domain. To the best of our knowledge, this is the first work of adapting multiple word embeddings of external domains to improve psychiatric symptom recognition in clinical text. Experimental results showed that the last two approaches outperformed the baseline methods, indicating the effectiveness of our new strategies to leverage embeddings from other domains.
Collapse
Affiliation(s)
- Yaoyun Zhang
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| | - Hee-Jin Li
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| | - Jingqi Wang
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Centerat Houston, Houston, TX, USA
| |
Collapse
|
10
|
Koola JD, Davis SE, Al-Nimri O, Parr SK, Fabbri D, Malin BA, Ho SB, Matheny ME. Development of an automated phenotyping algorithm for hepatorenal syndrome. J Biomed Inform 2018; 80:87-95. [PMID: 29530803 PMCID: PMC5920557 DOI: 10.1016/j.jbi.2018.03.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 02/21/2018] [Accepted: 03/07/2018] [Indexed: 12/27/2022]
Abstract
OBJECTIVE Hepatorenal Syndrome (HRS) is a devastating form of acute kidney injury (AKI) in advanced liver disease patients with high morbidity and mortality, but phenotyping algorithms have not yet been developed using large electronic health record (EHR) databases. We evaluated and compared multiple phenotyping methods to achieve an accurate algorithm for HRS identification. MATERIALS AND METHODS A national retrospective cohort of patients with cirrhosis and AKI admitted to 124 Veterans Affairs hospitals was assembled from electronic health record data collected from 2005 to 2013. AKI was defined by the Kidney Disease: Improving Global Outcomes criteria. Five hundred and four hospitalizations were selected for manual chart review and served as the gold standard. Electronic Health Record based predictors were identified using structured and free text clinical data, subjected through NLP from the clinical Text Analysis Knowledge Extraction System. We explored several dimension reduction techniques for the NLP data, including newer high-throughput phenotyping and word embedding methods, and ascertained their effectiveness in identifying the phenotype without structured predictor variables. With the combined structured and NLP variables, we analyzed five phenotyping algorithms: penalized logistic regression, naïve Bayes, support vector machines, random forest, and gradient boosting. Calibration and discrimination metrics were calculated using 100 bootstrap iterations. In the final model, we report odds ratios and 95% confidence intervals. RESULTS The area under the receiver operating characteristic curve (AUC) for the different models ranged from 0.73 to 0.93; with penalized logistic regression having the best discriminatory performance. Calibration for logistic regression was modest, but gradient boosting and support vector machines were superior. NLP identified 6985 variables; a priori variable selection performed similarly to dimensionality reduction using high-throughput phenotyping and semantic similarity informed clustering (AUC of 0.81 - 0.82). CONCLUSION This study demonstrated improved phenotyping of a challenging AKI etiology, HRS, over ICD-9 coding. We also compared performance among multiple approaches to EHR-derived phenotyping, and found similar results between methods. Lastly, we showed that automated NLP dimension reduction is viable for acute illness.
Collapse
Affiliation(s)
- Jejo D Koola
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA; Division of Hospital Medicine, Department of Medicine, University of California, San Diego, CA, USA.
| | - Sharon E Davis
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Sharidan K Parr
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Samuel B Ho
- VA San Diego Healthcare System, San Diego, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Diego, CA, USA
| | - Michael E Matheny
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
11
|
Uzuner Ö, Stubbs A, Filannino M. A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry. J Biomed Inform 2017; 75S:S1-S3. [PMID: 29042245 DOI: 10.1016/j.jbi.2017.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 10/06/2017] [Accepted: 10/09/2017] [Indexed: 02/06/2023]
Affiliation(s)
- Özlem Uzuner
- Department of Information Sciences and Technology, George Mason University, 4400 University Drive 5359, Nguyen Engineering Bldg, MS 1G8, Fairfax, VA 22030, USA. Tel.: +703-993-8633.
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA.
| | - Michele Filannino
- Department of Computer Science, State University of New York at Albany, Albany, NY, USA.
| |
Collapse
|