1
|
Da C, Duan Y, Ji Z, Chen J, Xia H, Weng Y, Zhou T, Yuan C, Cai T. Assessing the needs of patients with breast cancer and their families across various treatment phases using a Latent Dirichlet Allocation model: a text-mining approach to online health communities. Support Care Cancer 2024; 32:314. [PMID: 38683417 DOI: 10.1007/s00520-024-08513-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE This study aimed to assess the different needs of patients with breast cancer and their families in online health communities at different treatment phases using a Latent Dirichlet Allocation (LDA) model. METHODS Using Python, breast cancer-related posts were collected from two online health communities: patient-to-patient and patient-to-doctor. After data cleaning, eligible posts were categorized based on the treatment phase. Subsequently, an LDA model identifying the distinct need-related topics for each phase of treatment, including data preprocessing and LDA topic modeling, was established. Additionally, the demographic and interactive features of the posts were manually analyzed. RESULTS We collected 84,043 posts, of which 9504 posts were included after data cleaning. Early diagnosis and rehabilitation treatment phases had the highest and lowest number of posts, respectively. LDA identified 11 topics: three in the initial diagnosis phase and two in each of the remaining treatment phases. The topics included disease outcomes, diagnosis analysis, treatment information, and emotional support in the initial diagnosis phase; surgical options and outcomes, postoperative care, and treatment planning in the perioperative treatment phase; treatment options and costs, side effects management, and disease prognosis assessment in the non-operative treatment phase; diagnosis and treatment options, disease prognosis, and emotional support in the relapse and metastasis treatment phase; and follow-up and recurrence concerns, physical symptoms, and lifestyle adjustments in the rehabilitation treatment phase. CONCLUSION The needs of patients with breast cancer and their families differ across various phases of cancer therapy. Therefore, specific information or emotional assistance should be tailored to each phase of treatment based on the unique needs of patients and their families.
Collapse
Affiliation(s)
- Chaojin Da
- Department of Nursing, School of Clinical Nursing, Gansu Health Vocational College, Lanzhou, China
| | - Yiwen Duan
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China
| | - Zhenying Ji
- Department of Nursing, School of Clinical Nursing, Gansu Health Vocational College, Lanzhou, China
| | - Jialin Chen
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China
| | - Haozhi Xia
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China
| | - Yajuan Weng
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China
| | - Tingting Zhou
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China
| | - Changrong Yuan
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China.
| | - Tingting Cai
- School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China.
| |
Collapse
|
2
|
Shah AM, Lee KY, Hidayat A, Falchook A, Muhammad W. A text analytics approach for mining public discussions in online cancer forum: Analysis of multi-intent lung cancer treatment dataset. Int J Med Inform 2024; 184:105375. [PMID: 38367390 DOI: 10.1016/j.ijmedinf.2024.105375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 01/25/2024] [Accepted: 02/07/2024] [Indexed: 02/19/2024]
Abstract
BACKGROUND Online cancer forums (OCF) are increasingly popular platforms for patients and caregivers to discuss, seek information on, and share opinions about diseases and treatments. This interaction generates a substantial amount of unstructured text data, necessitating deeper exploration. Using time series data, our study exploits topic modeling in the novel domain of online cancer forums (OCFs) to identify meaningful topics and changing dynamics of online discussion across different lung cancer treatment intent groups. METHODS For this purpose, a dataset comprising 27,998 forum posts about lung cancer was collected from three OCFs: lungcancer.net, lungevity.org, and reddit.com, spanning the years 2016 to 2018. RESULTS The analysis reflects the public discussion on multi-intent lung cancer treatment over time, taking into account seasonal variations. Discussions on cancer symptoms and prevention garnered the most attention, dominating both curative and palliative care discussions. There were distinct seasonal peaks: curative care topics surged from winter to late spring, while palliative care topics peaked from late summer to mid-autumn. Keyword analysis highlighted that lung cancer diagnosis and treatment were primary topics, whereas cancer prevention and treatment outcomes were predominant across multi-care contexts. For the study period, curative care discussions predominantly revolved around informational support and disease syndromes. In contrast, social support and cancer prevention prevailed in the palliative care context. Notably, topics such as cancer screening and cancer treatment exhibit pronounced seasonal variations in curative care, peaking in frequency during the summers (May to August) of the study period. Meanwhile, the topic of tumor control within palliative care showed significant seasonal influence during the winters and summers of 2017 and 2018. CONCLUSION Our text analysis approach using OCF data shows potential for computational methods in this novel domain to gain insights into trends in public cancer communication and seasonal variations for a better understanding of improving personalized care, decision support, treatment outcomes, and quality of life.
Collapse
Affiliation(s)
- Adnan Muhammad Shah
- Chair of Marketing and Innovation, University of Hamburg, 20146, Germany; Department of Physics, Charles E. Schmidt College of Science, Florida Atlantic University, FL 33431-0991, United States; Department of Computer Engineering, Gachon University, Seoul 13120. Republic of Korea.
| | - Kang Yoon Lee
- Department of Computer Engineering, Gachon University, Seoul 13120. Republic of Korea.
| | - Abdullah Hidayat
- Department of Physics, Charles E. Schmidt College of Science, Florida Atlantic University, FL 33431-0991, United States.
| | - Aaron Falchook
- Department of Radiation Oncology, Memorial Hospital West, Memorial Cancer Institute (MCI), Pembroke Pines, FL, United States.
| | - Wazir Muhammad
- Department of Physics, Charles E. Schmidt College of Science, Florida Atlantic University, FL 33431-0991, United States.
| |
Collapse
|
3
|
Xiang M, Zhong D, Han M, Lv K. A Study on Online Health Community Users' Information Demands Based on the BERT-LDA Model. Healthcare (Basel) 2023; 11:2142. [PMID: 37570382 PMCID: PMC10419037 DOI: 10.3390/healthcare11152142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/17/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
As the economy and society develop and the standard of living improves, people's health awareness increases and the demand for health information grows. This study introduces an advanced BERT-LDA model to conduct topic-sentiment analysis within online health communities. It examines nine primary categories of user information requirements: causes, symptoms and manifestations, examination and diagnosis, treatment, self-management and regulation, impact, prevention, social life, and knowledge acquisition. By analyzing the distribution of positive and negative sentiments across each topic, the correlation between various health information demands and emotional expressions is investigated. The model established in this paper integrates BERT's semantic comprehension with LDA's topic modeling capabilities, enhancing the accuracy of topic identification and sentiment analysis while providing a more comprehensive evaluation of user information demands. This research furthers our understanding of users' emotional reactions and presents valuable insights for delivering personalized health information in online communities.
Collapse
Affiliation(s)
| | | | | | - Kun Lv
- Business School, Ningbo University, Ningbo 315211, China (D.Z.)
| |
Collapse
|
4
|
Xu Q, Zhou Y, Liao B, Xin Z, Xie W, Hu C, Luo A. Named Entity Recognition of Diabetes Online Health Community Data Using Multiple Machine Learning Models. Bioengineering (Basel) 2023; 10:659. [PMID: 37370590 DOI: 10.3390/bioengineering10060659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/19/2023] [Accepted: 05/25/2023] [Indexed: 06/29/2023] Open
Abstract
The rising prevalence of diabetes and the increasing awareness of self-health management have resulted in a surge in diabetes patients seeking health information and emotional support in online health communities. Consequently, there is a vast database of patient consultation information in these online health communities. However, due to the heterogeneity and incompleteness of the content, mining medical information and patient health data from these communities can be a challenge. To address this issue, we built the RoBERTa-BiLSTM-CRF (RBC) model for identifying entities in the online health community of diabetes. We selected 1889 question-answer texts from the most active online health community in China, Good Doctor Online, and used these public data to identify five types of entities. In addition, we conducted a comparative evaluation with three other commonly used models to validate the performance of our proposed model, including RoBERTa-CRF (RC), BilSTM-CRF (BC), and RoBERTa-Softmax (RS). The results showed that the RBC model achieved excellent performance on the test set, with an accuracy of 81.2% and an F1 score of 80.7%, outperforming the performance of traditional entity recognition models in named entity recognition in online medical communities for doctors and diabetes patients. The high performance of entity recognition in online health communities will provide a crucial knowledge source for constructing medical knowledge graphs. This integration would help alleviate the growing demand for medical consultations and the strain on healthcare resources, while assisting healthcare professionals in making informed decisions and providing personalized services to patients.
Collapse
Affiliation(s)
- Qian Xu
- Second Xiangya Hospital, Central South University, Changsha 410011, China
- School of Life Sciences, Central South University, Changsha 410013, China
- College of Computer Science and Engineering, Jishou University, Jishou 416000, China
- Key Laboratory of Medical Information Research, Central South University, College of Hunan Province, Changsha 410013, China
- Clinical Research Center for Cardiovascular Intelligent Healthcare in Hunan Province, Changsha 410011, China
| | - Yue Zhou
- Second Xiangya Hospital, Central South University, Changsha 410011, China
- School of Life Sciences, Central South University, Changsha 410013, China
- Key Laboratory of Medical Information Research, Central South University, College of Hunan Province, Changsha 410013, China
- Clinical Research Center for Cardiovascular Intelligent Healthcare in Hunan Province, Changsha 410011, China
| | - Bolin Liao
- College of Computer Science and Engineering, Jishou University, Jishou 416000, China
| | - Zirui Xin
- Second Xiangya Hospital, Central South University, Changsha 410011, China
- Key Laboratory of Medical Information Research, Central South University, College of Hunan Province, Changsha 410013, China
- Clinical Research Center for Cardiovascular Intelligent Healthcare in Hunan Province, Changsha 410011, China
| | - Wenzhao Xie
- Key Laboratory of Medical Information Research, Central South University, College of Hunan Province, Changsha 410013, China
- Clinical Research Center for Cardiovascular Intelligent Healthcare in Hunan Province, Changsha 410011, China
| | - Chao Hu
- Big Data Institute, Central South University, Changsha 410011, China
| | - Aijing Luo
- Second Xiangya Hospital, Central South University, Changsha 410011, China
- Key Laboratory of Medical Information Research, Central South University, College of Hunan Province, Changsha 410013, China
- Clinical Research Center for Cardiovascular Intelligent Healthcare in Hunan Province, Changsha 410011, China
| |
Collapse
|
5
|
Singh T, Roberts K, Cohen T, Cobb N, Franklin A, Myneni S. Discerning conversational context in online health communities for personalized digital behavior change solutions using Pragmatics to Reveal Intent in Social Media (PRISM) framework. J Biomed Inform 2023; 140:104324. [PMID: 36842490 PMCID: PMC10206862 DOI: 10.1016/j.jbi.2023.104324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 02/18/2023] [Accepted: 02/21/2023] [Indexed: 02/28/2023]
Abstract
BACKGROUND Online health communities (OHCs) have emerged as prominent platforms for behavior modification, and the digitization of online peer interactions has afforded researchers with unique opportunities to model multilevel mechanisms that drive behavior change. Existing studies, however, have been limited by a lack of methods that allow the capture of conversational context and socio-behavioral dynamics at scale, as manifested in these digital platforms. OBJECTIVE We develop, evaluate, and apply a novel methodological framework, Pragmatics to Reveal Intent in Social Media (PRISM), to facilitate granular characterization of peer interactions by combining multidimensional facets of human communication. METHODS We developed and applied PRISM to analyze peer interactions (N = 2.23 million) in QuitNet, an OHC for tobacco cessation. First, we generated a labeled set of peer interactions (n = 2,005) through manual annotation along three dimensions: communication themes (CTs), behavior change techniques (BCTs), and speech acts (SAs). Second, we used deep learning models to apply our qualitative codes at scale. Third, we applied our validated model to perform a retrospective analysis. Finally, using social network analysis (SNA), we portrayed large-scale patterns and relationships among the aforementioned communication dimensions embedded in peer interactions in QuitNet. RESULTS Qualitative analysis showed that the themes of social support and behavioral progress were common. The most used BCTs were feedback and monitoring and comparison of behavior, and users most commonly expressed their intentions using SAs-expressive and emotion. With additional in-domain pre-training, bidirectional encoder representations from Transformers (BERT) outperformed other deep learning models on the classification tasks. Content-specific SNA revealed that users' engagement or abstinence status is associated with the prevalence of various categories of BCTs and SAs, which also was evident from the visualization of network structures. CONCLUSIONS Our study describes the interplay of multilevel characteristics of online communication and their association with individual health behaviors.
Collapse
Affiliation(s)
- Tavleen Singh
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, USA.
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, USA
| | - Trevor Cohen
- Biomedical Informatics and Medical Education, The University of Washington, Seattle, WA, USA
| | - Nathan Cobb
- Georgetown University Medical Center, Washington, DC, USA
| | - Amy Franklin
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, USA
| | - Sahiti Myneni
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, USA
| |
Collapse
|
6
|
Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition. INFORMATION 2022. [DOI: 10.3390/info13120581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Intent classification and named entity recognition of medical questions are two key subtasks of the natural language understanding module in the question answering system. Most existing methods usually treat medical queries intent classification and named entity recognition as two separate tasks, ignoring the close relationship between the two tasks. In order to optimize the effect of medical queries intent classification and named entity recognition tasks, a multi-task learning model based on ALBERT-BILSTM is proposed for intent classification and named entity recognition of Chinese online medical questions. The multi-task learning model in this paper makes use of encoder parameter sharing, which enables the model’s underlying network to take into account both named entity recognition and intent classification features. The model learns the shared information between the two tasks while maintaining its unique characteristics during the decoding phase. The ALBERT pre-training language model is used to obtain word vectors containing semantic information and the bidirectional LSTM network is used for training. A comparative experiment of different models was conducted on Chinese medical questions dataset. Experimental results show that the proposed multi-task learning method outperforms the benchmark method in terms of precision, recall and F1 value. Compared with the single-task model, the generalization ability of the model has been improved.
Collapse
|
7
|
Wang S, Song F, Qiao Q, Liu Y, Chen J, Ma J. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data. Healthcare (Basel) 2022; 10:healthcare10061119. [PMID: 35742169 PMCID: PMC9223144 DOI: 10.3390/healthcare10061119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Poor adherence to management behaviors in Chinese Type 2 diabetes mellitus (T2DM) patients leads to an uncontrolled prognosis of diabetes, which results in significant economic costs for China. It is imperative to quickly locate vulnerability factors in the management behavior of patients with T2DM. (2) Methods: In this study, a thematic analysis of the collected interview materials was conducted to construct the themes of T2DM management vulnerability. We explored the applicability of the pre-trained models based on the evaluation metrics in text classification. (3) Results: We constructed 12 themes of vulnerability related to the health and well-being of people with T2DM in Tianjin. We considered that Bidirectional Encoder Representation from Transformers (BERT) performed better in this Natural Language Processing (NLP) task with a shorter completion time. With the splitting ratio of 6:3:1 and batch size of 64 for BERT, the test accuracy was 97.71%, the completion time was 10 min 24 s, and the macro-F1 score was 0.9752. (4) Conclusions: Our results proved the applicability of NLP techniques in this specific Chinese-language medical environment. We filled the knowledge gap in the application of NLP technologies in diabetes management. Our study provided strong support for using NLP techniques to rapidly locate vulnerability factors in T2DM management.
Collapse
|
8
|
Dehdarirad T, Freer J. Is there alignment amongst scientific literature, news media and patient forums regarding topics?: A study of breast and lung cancer. ONLINE INFORMATION REVIEW 2021. [DOI: 10.1108/oir-06-2020-0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeDuring recent years, web technologies and mass media have become prevalent in the context of medicine and health. Two examples of important web technologies used in health are news media and patient forums. Both have a significant role in shaping patients' perspective and behaviour in relation to health and illness, as well as the way that they might choose or change their treatment. In this paper, the authors investigated the application of web technologies using the data analysis approach. The authors did this analysis from the point of view of topics being discussed and disseminated via patients and journalists in breast and lung cancer. The study also investigated the (dis)alignment amongst these two groups and scientists in terms of topics.Design/methodology/approachThree data sets comprised documents published between 2014 and 2018 obtained from ProQuest and Web of Science Medline databases, alongside data from three major patient forums on breast and lung cancer. The analysis and visualisation in this paper have been done using the udpipe, igraph R packages and VOSviewer.FindingsThe study’s findings showed that in general scientists focussed more on prognosis and treatment of cancer, whereas patients and journalists focussed more on detection, prevention and role of social and emotional support. The only exception was for news coverage of lung cancer where the largest cluster was related to treatment, research in cancer treatment and therapies. However, when comparing coverage by scientists and journalists in terms of treatment, the focus of news articles in both cancer types was mainly on chemotherapy and complimentary therapies. Finally, topics such as lifestyle or pain management were only discussed by breast cancer patients.Originality/valueThe results obtained from this study may provide valuable insights into topics of interest for each group of scientists, journalist and patients as well as (dis)alignment among them in terms of topics. These findings are important as scientific research is heavily dependent on communication, and research does not exist in a bubble. Scientists and journalists can gain insights from patients' experiences and needs, which in turn may help them to have a more holistic and realistic view.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-06-2020-0228
Collapse
|
9
|
Sarker A, DeRoos A, Perrone J. Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework. J Am Med Inform Assoc 2021; 27:315-329. [PMID: 31584645 PMCID: PMC7025330 DOI: 10.1093/jamia/ocz162] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 08/14/2019] [Indexed: 01/02/2023] Open
Abstract
Objective Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media–based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource. Materials and Methods We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings. Results A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning. Discussion There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data. Conclusion The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Annika DeRoos
- College of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
10
|
Wang X, High A, Wang X, Zhao K. Predicting users' continued engagement in online health communities from the quantity and quality of received support. J Assoc Inf Sci Technol 2020. [DOI: 10.1002/asi.24436] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Xiangyu Wang
- Interdisciplinary Graduate Program in Informatics The University of Iowa Iowa City Iowa USA
| | - Andrew High
- Department of Communication Arts and Sciences Pennsylvania State University University Park Pennsylvania USA
| | - Xi Wang
- School of Information Central University of Finance and Economics Beijing China
| | - Kang Zhao
- Tipple College of Business The University of Iowa Iowa City Iowa USA
| |
Collapse
|
11
|
HCI for biomedical decision-making: From diagnosis to therapy. J Biomed Inform 2020; 111:103593. [PMID: 33069887 DOI: 10.1016/j.jbi.2020.103593] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/06/2020] [Indexed: 01/08/2023]
|
12
|
Wang X, Zhao K, Zhou X, Street N. Predicting User Posting Activities in Online Health Communities with Deep Learning. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2020. [DOI: 10.1145/3383780] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Online health communities (OHCs) represent a great source of social support for patients and their caregivers. Better predictions of user activities in OHCs can help improve user engagement and retention, which are important to manage and sustain a successful OHC. This article proposes a general framework to predict OHC user posting activities. Deep learning methods are adopted to learn from users’ temporal trajectories in both the volumes and content of posts published over time. Experiments based on data from a popular OHC for cancer survivors demonstrate that the proposed approach can improve the performance of user activity predictions. In addition, several topics of users’ posts are found to have strong impact on predicting users’ activities in the OHC.
Collapse
Affiliation(s)
| | | | - Xun Zhou
- University of Iowa, Iowa City, IA
| | | |
Collapse
|
13
|
An attention-based multi-task model for named entity recognition and intent analysis of Chinese online medical questions. J Biomed Inform 2020; 108:103511. [DOI: 10.1016/j.jbi.2020.103511] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 01/22/2023]
|
14
|
Griffin AC, Topaloglu U, Davis S, Chung AE. From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care. Yearb Med Inform 2020; 29:235-242. [PMID: 32823322 PMCID: PMC7442514 DOI: 10.1055/s-0040-1701983] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Conduct a survey of the literature for advancements in cancer informatics over the last three years in three specific areas where there has been unprecedented growth: 1) digital health; 2) machine learning; and 3) precision oncology. We also highlight the ethical implications and future opportunities within each area. METHODS A search was conducted over a three-year period in two electronic databases (PubMed, Google Scholar) to identify peer-reviewed articles and conference proceedings. Search terms included variations of the following: neoplasms[MeSH], informatics[MeSH], cancer, oncology, clinical cancer informatics, medical cancer informatics. The search returned too many articles for practical review (23,994 from PubMed and 23,100 from Google Scholar). Thus, we conducted searches of key PubMed-indexed informatics journals and proceedings. We further limited our search to manuscripts that demonstrated a clear focus on clinical or translational cancer informatics. Manuscripts were then selected based on their methodological rigor, scientific impact, innovation, and contribution towards cancer informatics as a field or on their impact on cancer care and research. RESULTS Key developments and opportunities in cancer informatics research in the areas of digital health, machine learning, and precision oncology were summarized. CONCLUSION While there are numerous innovations in the field of cancer informatics to advance prevention and clinical care, considerable challenges remain related to data sharing and privacy, digital accessibility, and algorithm biases and interpretation. The implementation and application of these findings in cancer care necessitates further consideration and research.
Collapse
Affiliation(s)
| | - Umit Topaloglu
- Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Sean Davis
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arlene E. Chung
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
- UNC Lineberger Comprehensive Cancer Center, Chapel Hill, NC, USA
| |
Collapse
|
15
|
Solikhah S, Matahari R, Utami FP, Handayani L, Marwati TA. Breast cancer stigma among Indonesian women: a case study of breast cancer patients. BMC WOMENS HEALTH 2020; 20:116. [PMID: 32493375 PMCID: PMC7268729 DOI: 10.1186/s12905-020-00983-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 05/26/2020] [Indexed: 02/27/2023]
Abstract
BACKGROUND The stigma experienced by cancer patients stems from the association of cancer with death, as cancer is the most feared disease worldwide, especially among cancer patients and their families. The stigma regarding breast cancer screening behaviour has not been critically evaluated and is poorly understood; therefore, we aimed to analyse the stigmatization of breast cancer patients in Indonesia to reduce the morbidity and mortality of breast cancer. METHODS A qualitative study using a focus group discussion (FGD) and in-depth interviews with thematic analysis was conducted. RESULTS One informant experienced breast pain and kept the referral letter, in which the medical doctor advised medical treatment, to herself for 3 months due to her embarrassment. A traditional healing practice known as 'kerokan', which involves scraping of the skin, and consumption of a traditional drink were used by most informants to decrease their breast pain. Finally, most informants were diagnosed with an advanced stage of cancer when they returned to the health care facility. In addition, financial difficulties were noted as barriers to breast cancer screening in Indonesia. CONCLUSIONS Feelings of fear and shame when diagnosed with breast cancer were reported by the informants in this study. Alternative treatment known as 'kerokan' was the first treatment sought for breast cancer symptoms due to financial difficulties among breast cancer patients. Informants were diagnosed with an advanced stage of cancer after they returned to the health care facility. A better understanding of early breast cancer symptoms could motivate women to seek out breast cancer treatment.
Collapse
Affiliation(s)
- Solikhah Solikhah
- Faculty of Public Health, Universitas Ahmad Dahlan, Yogyakarta, 55164, Indonesia.
| | - Ratu Matahari
- Faculty of Public Health, Universitas Ahmad Dahlan, Yogyakarta, 55164, Indonesia
| | - Fitriana Putri Utami
- Faculty of Public Health, Universitas Ahmad Dahlan, Yogyakarta, 55164, Indonesia
| | - Lina Handayani
- Faculty of Public Health, Universitas Ahmad Dahlan, Yogyakarta, 55164, Indonesia
| | - Tri Ani Marwati
- Faculty of Public Health, Universitas Ahmad Dahlan, Yogyakarta, 55164, Indonesia
| |
Collapse
|
16
|
Yin Z, Harrell M, Warner JL, Chen Q, Fabbri D, Malin BA. The therapy is making me sick: how online portal communications between breast cancer patients and physicians indicate medication discontinuation. J Am Med Inform Assoc 2019; 25:1444-1451. [PMID: 30380083 DOI: 10.1093/jamia/ocy118] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Accepted: 08/10/2018] [Indexed: 12/13/2022] Open
Abstract
Objective Online platforms have created a variety of opportunities for breast patients to discuss their hormonal therapy, a long-term adjuvant treatment to reduce the chance of breast cancer occurrence and mortality. The goal of this investigation is to ascertain the extent to which the messages breast cancer patients communicated through an online portal can indicate their potential for discontinuing hormonal therapy. Materials and Methods We studied the de-identified electronic medical records of 1106 breast cancer patients who were prescribed hormonal therapy at Vanderbilt University Medical Center over a 12-year period. We designed a data-driven approach to investigate patients' patterns of messaging with healthcare providers, the topics they communicated, and the extent to which these messaging behaviors associate with the likelihood that a patient will discontinue a prescribed 5-year regimen of therapy. Results The results indicates that messaging rate over time [hazard ratio (HR) = 1.373, P = 0.002], mentions of side effects (HR = 1.214, P = 0.006), and surgery-related topics (HR = 1.170, P = 0.034) were associated with increased risk of early medication discontinuation. In contrast, seeking professional suggestions (HR = 0.766, P = 0.002), expressing gratitude to healthcare providers (HR = 0.872, P = 0.044), and mentions of drugs used to treat side effects (HR = 0.807, P = 0.013) were associated with decreased risk of medication discontinuation. Discussion and Conclusion This investigation suggests that patient-generated content can inform the study of health-related behaviors. Given that approximately 50% of breast cancer patients do not complete a course of hormonal therapy as described, the identification of factors associated with medication discontinuation can facilitate real-time interventions to prevent early discontinuation.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | - Jeremy L Warner
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
17
|
Manas S, Young LE, Fujimoto K, Franklin A, Myneni S. Exploring the Social Structure of a Health-Related Online Community for Tobacco Cessation: A Two-Mode Network Approach. Stud Health Technol Inform 2019; 264:1268-1272. [PMID: 31438129 PMCID: PMC7656969 DOI: 10.3233/shti190430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Unhealthy behaviors, such as tobacco use, increase individual health risk while also creating a global economic burden on the healthcare system. Social ties have been seen as an important, yet complex factor, to sustain abstinence from these modifiable risk behaviors. However, the underlying social mechanisms are still opaque and poorly understood. Digital health communities provide opportunities to understand social dependencies of behavior change because peer interactions in these platforms are digitized. In this paper, we present a novel approach that integrates theories of behavior change and Exponential Random Graph Models (ERGMs) to understand structural dependencies between users of an online community and the behavior change techniques that are manifested in their communication using an affiliation network. Results indicate population specific traits in terms of individuals' engagement in peer communication embed behavior change techniques in online social settings. Implications for personalized health promotion technologies are discussed.
Collapse
Affiliation(s)
- Shruthi Manas
- Department of Biomedical Informatics, University of Texas, Houston, Texas, USA
| | - Lindsay E Young
- Department of Medicine, University of Chicago, Illinois, Chicago, USA
| | - Kayo Fujimoto
- Department of Public Health, University of Texas, Houston, Texas, USA
| | - Amy Franklin
- Department of Biomedical Informatics, University of Texas, Houston, Texas, USA
| | - Sahiti Myneni
- Department of Biomedical Informatics, University of Texas, Houston, Texas, USA
| |
Collapse
|
18
|
Conway M, Hu M, Chapman WW. Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data. Yearb Med Inform 2019; 28:208-217. [PMID: 31419834 PMCID: PMC6697505 DOI: 10.1055/s-0039-1677918] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. METHODS We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. RESULTS In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review "modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than "classical" machine learning methods.
Collapse
Affiliation(s)
- Mike Conway
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| | - Mengke Hu
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| |
Collapse
|
19
|
Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019; 26:561-576. [PMID: 30908576 PMCID: PMC7647332 DOI: 10.1093/jamia/ocz009] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 01/06/2019] [Accepted: 01/11/2019] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina M Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
20
|
Zhang Z, Hu Z, Yang H, Zhu R, Zuo D. Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services. J Biomed Inform 2018; 87:21-36. [PMID: 30240803 DOI: 10.1016/j.jbi.2018.09.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 08/27/2018] [Accepted: 09/17/2018] [Indexed: 11/26/2022]
Abstract
In online health expert question-answering (HQA) services, it is significant to automatically determine the quality of the answers. There are two prominent challenges in this task. First, the answers are usually written in short text, which makes it difficult to absorb the text semantic information. Second, it usually lacks sufficient labeled data but contains a huge amount of unlabeled data. To tackle these challenges, we propose a novel deep co-training framework based on factorization machines (FM) and deep textual views to intelligently and automatically identify the quality of HQA systems. More specifically, we exploit additional domain-specific semantic information from domain-specific word embeddings to expand the semantic space of short text and apply FM to excavate the non-independent interaction relationships among diverse features within individual views for improving the performance of the base classifier via co-training. Our learned deep textual views, the convolutional neural networks (CNN) view which focuses on extracting local features using convolution filters to locally model short text and the dependency-sensitive convolutional neural networks (DSCNN) view which focuses on capturing long-distance dependency information within the text to globally model short text, can then overcome the challenge of feature sparseness in the short text answers from the doctors. The developed co-training framework can effectively mine the highly non-linear semantic information embedded in the unlabeled data and expose the highly non-linear relationships between different views, which minimizes the labeling effort. Finally, we conduct extensive empirical evaluations and demonstrate that our proposed method can significantly improve the predictive performance of the answer quality in the context of HQA services.
Collapse
Affiliation(s)
- Zhan Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ze Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Haiqin Yang
- Department of Computing, Hang Seng Management College, Hong Kong; MTdata, Meitu, China
| | - Rong Zhu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Decheng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| |
Collapse
|
21
|
Myneni S, Sridharan V, Cobb N, Cohen T. Content-Sensitive Characterization of Peer Interactions of Highly Engaged Users in an Online Community for Smoking Cessation: Mixed-Methods Approach for Modeling User Engagement in Health Promotion Interventions. J Particip Med 2018; 10:e9. [PMID: 33052116 PMCID: PMC7434072 DOI: 10.2196/jopm.9745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 05/16/2018] [Accepted: 06/22/2018] [Indexed: 11/13/2022] Open
Abstract
Background Online communities provide affordable venues for behavior change. However, active user engagement holds the key to the success of these platforms. In order to enhance user engagement and in turn, health outcomes, it is essential to offer targeted interventional and informational support. Objective In this paper, we describe a content plus frequency framework to enable the characterization of highly engaged users in online communities and study theoretical techniques employed by these users through analysis of exchanged communication. Methods We applied the proposed methodology for analysis of peer interactions within QuitNet, an online community for smoking cessation. Firstly, we identified 144 highly engaged users based on communication frequency within QuitNet over a period of 16 years. Secondly, we used the taxonomy of behavior change techniques, text analysis methods from distributional semantics, machine learning, and sentiment analysis to assign theory-driven labels to content. Finally, we extracted content-specific insights from peer interactions (n=159,483 messages) among highly engaged QuitNet users. Results Studying user engagement using our proposed framework led to the definition of 3 user categories—conversation initiators, conversation attractors, and frequent posters. Specific behavior change techniques employed by top tier users (threshold set at top 3) within these 3 user groups were found to be goal setting, social support, rewards and threat, and comparison of outcomes. Engagement-specific trends within sentiment manifestations were also identified. Conclusions Use of content-inclusive analytics has offered deep insight into specific behavior change techniques employed by highly engaged users within QuitNet. Implications for personalization and active user engagement are discussed.
Collapse
Affiliation(s)
- Sahiti Myneni
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Vishnupriya Sridharan
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Nathan Cobb
- Georgetown University Medical Center, Washington, DC, United States
| | - Trevor Cohen
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
22
|
Tapi Nzali MD, Aze J, Bringay S, Lavergne C, Mollevi C, Optiz T. Reconciliation of patient/doctor vocabulary in a structured resource. Health Informatics J 2018; 25:1219-1231. [PMID: 29332530 DOI: 10.1177/1460458217751014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Today, social media is increasingly used by patients to openly discuss their health. Mining automatically such data is a challenging task because of the non-structured nature of the text and the use of many abbreviations and the slang terms. Our goal is to use Patient Authored Text to build a French Consumer Health Vocabulary on breast cancer field, by collecting various kinds of non-experts' expressions that are related to their diseases and then compare them to biomedical terms used by health care professionals. We combine several methods of the literature based on linguistic and statistical approaches to extract candidate terms used by non-experts and to link them to expert terms. We use messages extracted from the forum on ' cancerdusein.org ' and a vocabulary dedicated to breast cancer elaborated by the Institut National Du Cancer. We have built an efficient vocabulary composed of 192 validated relationships and formalized in Simple Knowledge Organization System ontology.
Collapse
|
23
|
Rios A, Kavuluru R. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J Biomed Inform 2017; 75S:S85-S93. [PMID: 28506904 PMCID: PMC5682241 DOI: 10.1016/j.jbi.2017.05.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 04/04/2017] [Accepted: 05/10/2017] [Indexed: 10/19/2022]
Abstract
BACKGROUND The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. OBJECTIVE Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. METHODS We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. RESULTS Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. CONCLUSION In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches.
Collapse
Affiliation(s)
- Anthony Rios
- Department of Computer Science, University of Kentucky, 329 Rose Street, Lexington, KY 40506, USA.
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, 329 Rose Street, Lexington, KY 40506, USA; Division of Biomedical Informatics, Department of Internal Medicine, University Kentucky, 725 Rose Street, Lexington, KY 40536, USA.
| |
Collapse
|
24
|
Amith M, Cunningham R, Savas LS, Boom J, Schvaneveldt R, Tao C, Cohen T. Using Pathfinder networks to discover alignment between expert and consumer conceptual knowledge from online vaccine content. J Biomed Inform 2017; 74:33-45. [PMID: 28823922 DOI: 10.1016/j.jbi.2017.08.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/28/2017] [Accepted: 08/14/2017] [Indexed: 10/19/2022]
Abstract
This study demonstrates the use of distributed vector representations and Pathfinder Network Scaling (PFNETS) to represent online vaccine content created by health experts and by laypeople. By analyzing a target audience's conceptualization of a topic, domain experts can develop targeted interventions to improve the basic health knowledge of consumers. The underlying assumption is that the content created by different groups reflects the mental organization of their knowledge. Applying automated text analysis to this content may elucidate differences between the knowledge structures of laypeople (heath consumers) and professionals (health experts). This paper utilizes vaccine information generated by laypeople and health experts to investigate the utility of this approach. We used an established technique from cognitive psychology, Pathfinder Network Scaling to infer the structure of the associational networks between concepts learned from online content using methods of distributional semantics. In doing so, we extend the original application of PFNETS to infer knowledge structures from individual participants, to infer the prevailing knowledge structures within communities of content authors. The resulting graphs reveal opportunities for public health and vaccination education experts to improve communication and intervention efforts directed towards health consumers. Our efforts demonstrate the feasibility of using an automated procedure to examine the manifestation of conceptual models within large bodies of free text, revealing evidence of conflicting understanding of vaccine concepts among health consumers as compared with health experts. Additionally, this study provides insight into the differences between consumer and expert abstraction of domain knowledge, revealing vaccine-related knowledge gaps that suggest opportunities to improve provider-patient communication.
Collapse
Affiliation(s)
- Muhammad Amith
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1)
| | - Rachel Cunningham
- Texas Children's Hospital, 6621 Fannin St, Houston, TX, United States(3)
| | - Lara S Savas
- The University of Texas School of Public Health at Houston, 1200 Pressler Street Houston, TX 77030, United States(2)
| | - Julie Boom
- Texas Children's Hospital, 6621 Fannin St, Houston, TX, United States(3)
| | - Roger Schvaneveldt
- Arizona State University, Tempe, AZ, United States(4); New Mexico State University, Las Cruces, NM, United States(5)
| | - Cui Tao
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1)
| | - Trevor Cohen
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1).
| |
Collapse
|
25
|
Tapi Nzali MD, Bringay S, Lavergne C, Mollevi C, Opitz T. What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer. JMIR Med Inform 2017; 5:e23. [PMID: 28760725 PMCID: PMC5556259 DOI: 10.2196/medinform.7779] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 06/16/2017] [Accepted: 06/17/2017] [Indexed: 11/13/2022] Open
Abstract
Background Social media dedicated to health are increasingly used by patients and health professionals. They are rich textual resources with content generated through free exchange between patients. We are proposing a method to tackle the problem of retrieving clinically relevant information from such social media in order to analyze the quality of life of patients with breast cancer. Objective Our aim was to detect the different topics discussed by patients on social media and to relate them to functional and symptomatic dimensions assessed in the internationally standardized self-administered questionnaires used in cancer clinical trials (European Organization for Research and Treatment of Cancer [EORTC] Quality of Life Questionnaire Core 30 [QLQ-C30] and breast cancer module [QLQ-BR23]). Methods First, we applied a classic text mining technique, latent Dirichlet allocation (LDA), to detect the different topics discussed on social media dealing with breast cancer. We applied the LDA model to 2 datasets composed of messages extracted from public Facebook groups and from a public health forum (cancerdusein.org, a French breast cancer forum) with relevant preprocessing. Second, we applied a customized Jaccard coefficient to automatically compute similarity distance between the topics detected with LDA and the questions in the self-administered questionnaires used to study quality of life. Results Among the 23 topics present in the self-administered questionnaires, 22 matched with the topics discussed by patients on social media. Interestingly, these topics corresponded to 95% (22/23) of the forum and 86% (20/23) of the Facebook group topics. These figures underline that topics related to quality of life are an important concern for patients. However, 5 social media topics had no corresponding topic in the questionnaires, which do not cover all of the patients’ concerns. Of these 5 topics, 2 could potentially be used in the questionnaires, and these 2 topics corresponded to a total of 3.10% (523/16,868) of topics in the cancerdusein.org corpus and 4.30% (3014/70,092) of the Facebook corpus. Conclusions We found a good correspondence between detected topics on social media and topics covered by the self-administered questionnaires, which substantiates the sound construction of such questionnaires. We detected new emerging topics from social media that can be used to complete current self-administered questionnaires. Moreover, we confirmed that social media mining is an important source of information for complementary analysis of quality of life.
Collapse
Affiliation(s)
- Mike Donald Tapi Nzali
- Institut Montpelliérain Alexander Grothendieck (IMAG), Department of Mathematics, Montpellier University, Montpellier, France.,Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Department of Computer Science, Montpellier University, Montpellier, France
| | - Sandra Bringay
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Department of Computer Science, Montpellier University, Montpellier, France.,Paul Valery University, Montpellier, France
| | - Christian Lavergne
- Institut Montpelliérain Alexander Grothendieck (IMAG), Department of Mathematics, Montpellier University, Montpellier, France.,Paul Valery University, Montpellier, France
| | - Caroline Mollevi
- Biometrics Unit, Institut du Cancer Montpellier (ICM), Montpellier, France
| | - Thomas Opitz
- BioSP Unit, Institut National de la Recherche Agronomique (INRA), Avignon, France
| |
Collapse
|
26
|
Zhang S, Kang T, Qiu L, Zhang W, Yu Y, Elhadad N. Cataloguing Treatments Discussed and Used in Online Autism Communities. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2017; 2017:123-131. [PMID: 28736777 PMCID: PMC5516208 DOI: 10.1145/3038912.3052661] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.
Collapse
Affiliation(s)
- Shaodian Zhang
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| | - Tian Kang
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| | - Lin Qiu
- Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China
| | - Weinan Zhang
- Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China
| | - Yong Yu
- Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| |
Collapse
|
27
|
Zhang S, Elhadad N. Factors Contributing to Dropping-out in an Online Health Community: Static and Longitudinal Analyses. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:2090-2099. [PMID: 28269969 PMCID: PMC5333218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Dropping-out, which refers to when an individual abandons an intervention, is common in Internet-based studies as well as in online health communities. Community facilitators and health researchers are interested in this phenomenon because it usually indicates dissatisfaction towards the community and/or its failure to deliver expected benefits. In this study, we propose a method to identify dropout members from a large public online breast cancer community. We then study quantitatively what longitudinal factors of participation are correlated with dropping-out. Our experimental results suggest that dropout members discuss diagnosis- and treatment-related topics more than other topics. Furthermore, in the time before withdrawing from the community, dropout members tend to initiate more discussions but do not receive adequate response from the other members. We also discuss implications of our results and challenges in dropout-member identification. This study contributes to further understanding community participation and opens up a number of future research questions.
Collapse
Affiliation(s)
- Shaodian Zhang
- Biomedical Informatics, Columbia University, New York, NY
| | - Noémie Elhadad
- Biomedical Informatics, Columbia University, New York, NY
| |
Collapse
|
28
|
Sridharan V, Cohen T, Cobb N, Myneni S. Characterization of Temporal Semantic Shifts of Peer-to-Peer Communication in a Health-Related Online Community: Implications for Data-driven Health Promotion. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1977-1986. [PMID: 28269957 PMCID: PMC5333293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With online social platforms gaining popularity as venues of behavior change, it is important to understand the ways in which these platforms facilitate peer interactions. In this paper, we characterize temporal trends in user communication through mapping of theoretically-linked semantic content. We used qualitative coding and automated text analysis to assign theoretical techniques to peer interactions in an online community for smoking cessation, subsequently facilitating temporal visualization of the observed techniques. Results indicate manifestation of several behavior change techniques such as feedback and monitoring' and 'rewards'. Automated methods yielded reasonable results (F-measure=0.77). Temporal trends among relapsers revealed reduction in communication after a relapse event. This social withdrawal may be attributed to failure guilt after the relapse. Results indicate significant change in thematic categories such as 'social support', 'natural consequences', and 'comparison of outcomes' pre and post relapse. Implications for development of behavioral support technologies that promote long-term abstinence are discussed.
Collapse
Affiliation(s)
| | - Trevor Cohen
- The University of Texas School of Biomedical Informatics at Houston, TX, USA
| | - Nathan Cobb
- Georgetown University Medical Center, Washington, DC, United States
| | - Sahiti Myneni
- The University of Texas School of Biomedical Informatics at Houston, TX, USA
| |
Collapse
|
29
|
Torii M, Tilak SS, Doan S, Zisook DS, Fan JW. Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics. BIOMEDICAL INFORMATICS INSIGHTS 2016; 8:1-11. [PMID: 27375358 PMCID: PMC4915789 DOI: 10.4137/bii.s37791] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Revised: 05/01/2016] [Accepted: 05/17/2016] [Indexed: 11/25/2022]
Abstract
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.
Collapse
Affiliation(s)
- Manabu Torii
- Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA
| | - Sameer S Tilak
- Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA
| | - Son Doan
- Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA
| | - Daniel S Zisook
- Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA
| | - Jung-Wei Fan
- Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA
| |
Collapse
|