1
|
Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer. JMIR Cancer 2024; 10:e51332. [PMID: 38723250 PMCID: PMC11117131 DOI: 10.2196/51332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/19/2023] [Accepted: 04/04/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there. OBJECTIVE With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance. METHODS We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic). RESULTS Our binary model (F1-score=0.92) and multiclass model (F1-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the "no_patient" class, but sadness and anxiety were higher for the "family_members" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the "no_patient" class, but topics on cancer treatment were higher in the "family_members" class. CONCLUSIONS Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.
Collapse
|
2
|
Adverse Event Signal Detection Using Patients' Concerns in Pharmaceutical Care Records: Evaluation of Deep Learning Models. J Med Internet Res 2024; 26:e55794. [PMID: 38625718 PMCID: PMC11061790 DOI: 10.2196/55794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/14/2024] [Accepted: 03/09/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients' subjective opinions (patients' voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient-generated text data, but few studies have focused on the improvement of real-time safety monitoring for individual patients. In addition, no study has yet been performed to validate deep learning models for screening patients' narratives for clinically important adverse event signals that require medical intervention. In our previous work, novel deep learning models have been developed to detect adverse event signals for hand-foot syndrome or adverse events limiting patients' daily lives from the authored narratives of patients with cancer, aiming ultimately to use them as safety monitoring support tools for individual patients. OBJECTIVE This study was designed to evaluate whether our deep learning models can screen clinically important adverse event signals that require intervention by health care professionals. The applicability of our deep learning models to data on patients' concerns at pharmacies was also assessed. METHODS Pharmaceutical care records at community pharmacies were used for the evaluation of our deep learning models. The records followed the SOAP format, consisting of subjective (S), objective (O), assessment (A), and plan (P) columns. Because of the unique combination of patients' concerns in the S column and the professional records of the pharmacists, this was considered a suitable data for the present purpose. Our deep learning models were applied to the S records of patients with cancer, and the extracted adverse event signals were assessed in relation to medical actions and prescribed drugs. RESULTS From 30,784 S records of 2479 patients with at least 1 prescription of anticancer drugs, our deep learning models extracted true adverse event signals with more than 80% accuracy for both hand-foot syndrome (n=152, 91%) and adverse events limiting patients' daily lives (n=157, 80.1%). The deep learning models were also able to screen adverse event signals that require medical intervention by health care providers. The extracted adverse event signals could reflect the side effects of anticancer drugs used by the patients based on analysis of prescribed anticancer drugs. "Pain or numbness" (n=57, 36.3%), "fever" (n=46, 29.3%), and "nausea" (n=40, 25.5%) were common symptoms out of the true adverse event signals identified by the model for adverse events limiting patients' daily lives. CONCLUSIONS Our deep learning models were able to screen clinically important adverse event signals that require intervention for symptoms. It was also confirmed that these deep learning models could be applied to patients' subjective information recorded in pharmaceutical care records accumulated during pharmacists' daily work.
Collapse
|
3
|
Exploring the Impact of the COVID-19 Pandemic on Twitter in Japan: Qualitative Analysis of Disrupted Plans and Consequences. JMIR INFODEMIOLOGY 2024; 4:e49699. [PMID: 38557446 PMCID: PMC10986681 DOI: 10.2196/49699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 08/11/2023] [Accepted: 03/06/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Despite being a pandemic, the impact of the spread of COVID-19 extends beyond public health, influencing areas such as the economy, education, work style, and social relationships. Research studies that document public opinions and estimate the long-term potential impact after the pandemic can be of value to the field. OBJECTIVE This study aims to uncover and track concerns in Japan throughout the COVID-19 pandemic by analyzing Japanese individuals' self-disclosure of disruptions to their life plans on social media. This approach offers alternative evidence for identifying concerns that may require further attention for individuals living in Japan. METHODS We extracted 300,778 tweets using the query phrase Corona-no-sei ("due to COVID-19," "because of COVID-19," or "considering COVID-19"), enabling us to identify the activities and life plans disrupted by the pandemic. The correlation between the number of tweets and COVID-19 cases was analyzed, along with an examination of frequently co-occurring words. RESULTS The top 20 nouns, verbs, and noun plus verb pairs co-occurring with Corona no-sei were extracted. The top 5 keywords were graduation ceremony, cancel, school, work, and event. The top 5 verbs were disappear, go, rest, can go, and end. Our findings indicate that education emerged as the top concern when the Japanese government announced the first state of emergency. We also observed a sudden surge in anxiety about material shortages such as toilet paper. As the pandemic persisted and more states of emergency were declared, we noticed a shift toward long-term concerns, including careers, social relationships, and education. CONCLUSIONS Our study incorporated machine learning techniques for disease monitoring through the use of tweet data, allowing the identification of underlying concerns (eg, disrupted education and work conditions) throughout the 3 stages of Japanese government emergency announcements. The comparison with COVID-19 case numbers provides valuable insights into the short- and long-term societal impacts, emphasizing the importance of considering citizens' perspectives in policy-making and supporting those affected by the pandemic, particularly in the context of Japanese government decision-making.
Collapse
|
4
|
Extracting Spatio-Temporal Trends in Medical Research Prioritization Through Natural Language Processing of Case Report Abstracts. Stud Health Technol Inform 2024; 310:634-638. [PMID: 38269886 DOI: 10.3233/shti231042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Medical research prioritization is an important aspect of decision-making by researchers and relevant stakeholders. The ever-increasing availability of technology and data has opened doors to new discoveries and new questions. This makes it difficult for researchers and relevant stakeholders to make well-informed decisions about the research areas they want to support and the nations they should look for collaborations. It is, therefore, useful to look at the spatio-temporal trends of medical research prioritization to gain insight into popular and neglected areas of research as well as the allocation of prioritization of each nation. In this study, we develop a system that collects, classifies, and summarizes case report abstracts according to the location, time, and disease category of the report. The additional classifications allow us to visualize and monitor the trends in medical research prioritization by location, time, and disease category.
Collapse
|
5
|
Detection of Adverse Event Signals with Severity Grade Classification from Cancer Patient Narrative. Stud Health Technol Inform 2024; 310:554-558. [PMID: 38269870 DOI: 10.3233/shti231026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Adverse event (AE) management is crucial to improve anti-cancer treatment outcomes, but it is reported that some AE signals can be missed in clinical visits. Thus, monitoring AE signals seamlessly, including events outside hospitals, would be helpful for early intervention. Here we investigated how to detect AE signals from texts written by cancer patients themselves by developing deep-learning (DL) models to classify posts mentioning AEs according to severity grade, in order to focus on those that might need immediate treatment interventions. Using patient blogs written in Japanese by cancer patients as a data source, we built DL models based on three approaches, BERT, ELECTRA, and T5. Among these models, T5 showed the best F1 scores for both Grade ≥ 1 and ≥ 2 article classification tasks (0.85 and 0.53, respectively). This model might benefit patients by enabling earlier AE signal detection, thereby improving quality of life.
Collapse
|
6
|
Narrowing the Patient-Physician Gap Based on Self-Reporting and Monthly Hepatologist Feedback for Patients With Alcohol-Related Liver Disease: Interventional Pilot Study Using a Journaling Smartphone App. JMIR Form Res 2023; 7:e44762. [PMID: 38113066 PMCID: PMC10762609 DOI: 10.2196/44762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 01/16/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Screening and intervention for alcohol use disorders (AUDs) are recommended to improve the prognosis of patients with alcohol-related liver disease (ALD). Most patients' smartphone app diaries record drinking behavior for self-monitoring. A smartphone app can be expected to also be helpful for physicians because it can provide rich patient information to hepatologists, leading to suitable feedback. We conducted this prospective pilot study to assess the use of a smartphone app as a journaling tool and as a self-report-based feedback source for patients with ALD. OBJECTIVE The aims of this study were assessment of whether journaling (self-report) and self-report-based feedback can help patients maintain abstinence and improve liver function data. METHODS This pilot study used a newly developed smartphone journaling app for patients, with input data that physicians can review. After patients with ALD were screened for harmful alcohol use, some were invited to use the smartphone journaling app for 8 weeks. Their self-reported alcohol intake, symptoms, and laboratory data were recorded at entry, week 4, and week 8. Biomarkers for alcohol use included gamma glutamyl transferase (GGT), percentage of carbohydrate-deficient transferrin to transferrin (%CDT), and GGT-CDT (GGT-CDT= 0.8 × ln[GGT] + 1.3 × ln[%CDT]). At each visit, their recorded data were reviewed by a hepatologist to evaluate changes in alcohol consumption and laboratory data. The relation between those outcomes and app usage was also investigated. RESULTS Of 14 patients agreeing to participate, 10 completed an 8-week follow-up, with diary input rates between 44% and 100% of the expected days. Of the 14 patients, 2 withdrew from clinical follow-up, and 2 additional patients never used the smartphone journaling app. Using the physician's view, a treating hepatologist gave feedback via comments to patients at each visit. Mean self-reported alcohol consumption dropped from baseline (100, SD 70 g) to week 4 (13, SD 25 g; P=.002) and remained lower at week 8 (13, SD 23 g; P=.007). During the study, 5 patients reported complete abstinence. No significant changes were found in mean GGT and mean %CDT alone, but the mean GGT-CDT combination dropped significantly from entry (5.2, SD 1.2) to the week 4 visit (4.8, SD 1.1; P=.02) and at week 8 (4.8, SD 1.0; P=.01). During the study period, decreases in mean total bilirubin (3.0, SD 2.4 mg/dL to 2.4, SD 1.9 mg/dL; P=.01) and increases in mean serum albumin (3.0, SD 0.9 g/dL to 3.3, SD 0.8 g/dL; P=.009) were recorded. CONCLUSIONS These pilot study findings revealed that a short-term intervention with a smartphone journaling app used by both patients and treatment-administering hepatologists was associated with reduced drinking and improved liver function. TRIAL REGISTRATION UMIN CTR UMIN000045285; http://tinyurl.com/yvvk38tj.
Collapse
|
7
|
Diagnosing psychiatric disorders from history of present illness using a large-scale linguistic model. Psychiatry Clin Neurosci 2023; 77:597-604. [PMID: 37526294 DOI: 10.1111/pcn.13580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/22/2023] [Accepted: 07/27/2023] [Indexed: 08/02/2023]
Abstract
AIM Recent advances in natural language processing models are expected to provide diagnostic assistance in psychiatry from the history of present illness (HPI). However, existing studies have been limited, with the target diseases including only major diseases, small sample sizes, or no comparison with diagnoses made by psychiatrists to ensure accuracy. Therefore, we formulated an accurate diagnostic model that covers all psychiatric disorders. METHODS HPIs and diagnoses were extracted from discharge summaries of 2,642 cases at the Nara Medical University Hospital, Japan, from 21 May 2007, to 31 May 31 2021. The diagnoses were classified into 11 classes according to the code from ICD-10 Chapter V. Using UTH-BERT pre-trained on the electronic medical records of the University of Tokyo Hospital, Japan, we predicted the main diagnoses at discharge based on HPIs and compared the concordance rate with the results of psychiatrists. The psychiatrists were divided into two groups: semi-Designated with 3-4 years of experience and Residents with only 2 months of experience. RESULTS The model's match rate was 74.3%, compared to 71.5% for the semi-Designated psychiatrists and 69.4% for the Residents. If the cases were limited to those correctly answered by the semi-Designated group, the model and the Residents performed at 84.9% and 83.3%, respectively. CONCLUSION We demonstrated that the model matched the diagnosis predicted from the HPI with a high probability to the principal diagnosis at discharge. Hence, the model can provide diagnostic suggestions in actual clinical practice.
Collapse
|
8
|
Adverse event signal extraction from cancer patients' narratives focusing on impact on their daily-life activities. Sci Rep 2023; 13:15516. [PMID: 37726371 PMCID: PMC10509234 DOI: 10.1038/s41598-023-42496-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 09/11/2023] [Indexed: 09/21/2023] Open
Abstract
Adverse event (AE) management is important to improve anti-cancer treatment outcomes, but it is known that some AE signals can be missed during clinical visits. In particular, AEs that affect patients' activities of daily living (ADL) need careful monitoring as they may require immediate medical intervention. This study aimed to build deep-learning (DL) models for extracting signals of AEs limiting ADL from patients' narratives. The data source was blog posts written in Japanese by breast cancer patients. After pre-processing and annotation for AE signals, three DL models (BERT, ELECTRA, and T5) were trained and tested in three different approaches for AE signal identification. The performances of the trained models were evaluated in terms of precision, recall, and F1 scores. From 2,272 blog posts, 191 and 702 articles were identified as describing AEs limiting ADL or not limiting ADL, respectively. Among tested DL modes and approaches, T5 showed the best F1 scores to identify articles with AE limiting ADL or all AE: 0.557 and 0.811, respectively. The most frequent AE signals were "pain or numbness", "fatigue" and "nausea". Our results suggest that this AE monitoring scheme focusing on patients' ADL has potential to reinforce current AE management provided by medical staff.
Collapse
|
9
|
Exploring the use of AI text-to-image generation to downregulate negative emotions in an expressive writing application. ROYAL SOCIETY OPEN SCIENCE 2023; 10:220238. [PMID: 36636309 PMCID: PMC9810434 DOI: 10.1098/rsos.220238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Conventional writing therapies are versatile, accessible and easy to facilitate online, but often require participants to self-disclose traumatic experiences. To make expressive writing therapies safer for online, unsupervised environments, we explored the use of text-to-image generation as a means to downregulate negative emotions during a fictional writing exercise. We developed a writing tool, StoryWriter, that uses Generative Adversarial Network models to generate artwork from users' narratives in real time. These images were intended to positively distract users from their negative emotions throughout the writing task. In this paper, we report the outcomes of two user studies: Study 1 (N = 388), which experimentally examined the efficacy of this application via negative versus neutral emotion induction and image generation versus no image generation control groups; and Study 2 (N = 54), which qualitatively examined open-ended feedback. Our results are heterogeneous: both studies suggested that StoryWriter somewhat contributed to improved emotion outcomes for participants with pre-existing negative emotions, but users' open-ended responses indicated that these outcomes may be adversely modulated by the generated images, which could undermine the therapeutic benefits of the writing task itself.
Collapse
|
10
|
The Disruption of the Cystic Fibrosis Community’s Experiences and Concerns during the COVID-19 Pandemic: Topic Modeling and Time Series Analysis of Reddit Comments (Preprint). J Med Internet Res 2022; 25:e45249. [PMID: 37079359 PMCID: PMC10160941 DOI: 10.2196/45249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic disrupted the needs and concerns of the cystic fibrosis community. Patients with cystic fibrosis were particularly vulnerable during the pandemic due to overlapping symptoms in addition to the challenges patients with rare diseases face, such as the need for constant medical aid and limited information regarding their disease or treatments. Even before the pandemic, patients vocalized these concerns on social media platforms like Reddit and formed communities and networks to share insight and information. This data can be used as a quick and efficient source of information about the experiences and concerns of patients with cystic fibrosis in contrast to traditional survey- or clinical-based methods. OBJECTIVE This study applies topic modeling and time series analysis to identify the disruption caused by the COVID-19 pandemic and its impact on the cystic fibrosis community's experiences and concerns. This study illustrates the utility of social media data in gaining insight into the experiences and concerns of patients with rare diseases. METHODS We collected comments from the subreddit r/CysticFibrosis to represent the experiences and concerns of the cystic fibrosis community. The comments were preprocessed before being used to train the BERTopic model to assign each comment to a topic. The number of comments and active users for each data set was aggregated monthly per topic and then fitted with an autoregressive integrated moving average (ARIMA) model to study the trends in activity. To verify the disruption in trends during the COVID-19 pandemic, we assigned a dummy variable in the model where a value of "1" was assigned to months in 2020 and "0" otherwise and tested for its statistical significance. RESULTS A total of 120,738 comments from 5827 users were collected from March 24, 2011, until August 31, 2022. We found 22 topics representing the cystic fibrosis community's experiences and concerns. Our time series analysis showed that for 9 topics, the COVID-19 pandemic was a statistically significant event that disrupted the trends in user activity. Of the 9 topics, only 1 showed significantly increased activity during this period, while the other 8 showed decreased activity. This mixture of increased and decreased activity for these topics indicates a shift in attention or focus on discussion topics during this period. CONCLUSIONS There was a disruption in the experiences and concerns the cystic fibrosis community faced during the COVID-19 pandemic. By studying social media data, we were able to quickly and efficiently study the impact on the lived experiences and daily struggles of patients with cystic fibrosis. This study shows how social media data can be used as an alternative source of information to gain insight into the needs of patients with rare diseases and how external factors disrupt them.
Collapse
|
11
|
Transferability Based on Drug Structure Similarity in Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach (Preprint). J Med Internet Res 2022; 25:e44870. [PMID: 37133915 DOI: 10.2196/44870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. OBJECTIVE This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. METHODS This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). RESULTS The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. CONCLUSIONS The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.
Collapse
|
12
|
Measuring concerns about the COVID-19 vaccine among Japanese internet users through search queries. Sci Rep 2022; 12:15037. [PMID: 36057657 PMCID: PMC9440921 DOI: 10.1038/s41598-022-18307-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 08/09/2022] [Indexed: 11/09/2022] Open
Abstract
With the increasing availability of the COVID-19 vaccines, vaccination has been rapidly promoted globally as a countermeasure against the spread of COVID-19. In Japan, vaccination was first introduced in February 2021. However, the amount of concern towards vaccination differs between individuals, and topics of concern include adverse reactions and side effects. This study investigated attitudes toward vaccines or vaccination during the COVID-19 pandemic across different Japanese prefectures, using Yahoo! JAPAN search queries. We first defined a vaccine concern index (VCI) by aggregating the search counts of vaccine-related queries from Yahoo! JAPAN users before examining VCI across all Japanese prefectures, accounting for gender and age. Our results demonstrated that VCI tended to be lower in more populated areas, and VCI was higher in their 20s to 40s than older people, especially in female users. Furthermore, there was a significant positive correlation (Spearman's Rank correlation coefficient [Formula: see text] = 0.60, [Formula: see text]) between VCI and prefectural vaccination rate, suggesting that web searching of adverse vaccine reactions may precede actual vaccination. This could reflect the information-seeking behavior of individuals who are accepting of vaccinations.
Collapse
|
13
|
Between Fact and Fabrication: How Visual Art Might Nurture Environmental Consciousness. Front Psychol 2022; 13:925843. [PMID: 35959074 PMCID: PMC9360767 DOI: 10.3389/fpsyg.2022.925843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 06/06/2022] [Indexed: 11/29/2022] Open
Abstract
Previous studies have highlighted the communicative limitations of artistic visualizations, which are often too conceptual or interpretive to enhance public understanding of (and volition to act upon) scientific climate information. This seems to suggest a need for greater factuality/concreteness in artistic visualization projects, which may indeed be the case. However, in this paper, we synthesize insights from environmental psychology, the psychology of art, and intermediate disciplines like eco-aesthetics, to argue that artworks—defined by their counterfactual qualities—can be effective for stimulating elements of environmental consciousness. We also argue that different artworks may yield different effects depending on how they combine counter/factual strategies. In so doing, we assert that effective artistic perceptualization—here expressed as affectivization—exceeds the faithful translation of facts from one mode to another, and cannot be encapsulated in a single example of un/successful art.
Collapse
|
14
|
Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer. BMC Med Inform Decis Mak 2022; 22:158. [PMID: 35717167 PMCID: PMC9206132 DOI: 10.1186/s12911-022-01897-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Background Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. Materials and methods Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. Results The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. Conclusion We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.
Collapse
|
15
|
Clinical Comparable Corpus Describing the Same Subjects with Different Expressions. Stud Health Technol Inform 2022; 290:253-257. [PMID: 35673012 DOI: 10.3233/shti220073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Medical artificial intelligence (AI) systems need to learn to recognize synonyms or paraphrases describing the same anatomy, disease, treatment, etc. to better understand real-world clinical documents. Existing linguistic resources focus on variants at the word or sentence level. To handle linguistic variations on a broader scale, we proposed the Medical Text Radiology Report section Japanese version (MedTxt-RR-JA), the first clinical comparable corpus. MedTxt-RR-JA was built by recruiting nine radiologists to diagnose the same 15 lung cancer cases in Radiopaedia, an open-access radiological repository. The 135 radiology reports in MedTxt-RR-JA were shown to contain word-, sentence- and document-level variations maintaining similarity of contents. MedTxt-RR-JA is also the first publicly available Japanese radiology report corpus that would help to overcome poor data availability for Japanese medical AI systems. Moreover, our methodology can be applied widely to building clinical corpora without privacy concerns.
Collapse
|
16
|
AUTOMETA: Automatic Meta-Analysis System Employing Natural Language Processing. Stud Health Technol Inform 2022; 290:612-616. [PMID: 35673089 DOI: 10.3233/shti220150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Meta-analyses examine the results of different clinical studies to determine whether a treatment is effective or not. Meta-analyses provide the gold standard for medical evidence. Despite their importance, meta-analyses are time-consuming and this poses a challenge where timeliness is important. Research articles are also increasing rapidly and most meta-analyses become outdated after publication since they have not incorporated new evidence. Therefore, there is increasing interest to automate meta-analysis so as to speed up the process and allow for automatic update when new results are available. In this preliminary study we present AUTOMETA, our proposed system for automating meta-analysis which employs existing natural language processing methods for identifying Participants, Intervention, Control, and Outcome (PICO) elements. We show that our system can perform advanced meta-analyses by parsing numeric outcomes to identify the number of patients having certain outcomes. We also present a new dataset which improves previous datasets by incorporating additional tags to identify detailed information.
Collapse
|
17
|
Abstract
OBJECTIVES Owing to the rapid progress of natural language processing (NLP), the role of NLP in the medical field has radically gained considerable attention from both NLP and medical informatics. Although numerous medical NLP papers are published annually, there is still a gap between basic NLP research and practical product development. This gap raises questions, such as what has medical NLP achieved in each medical field, and what is the burden for the practical use of NLP? This paper aims to clarify the above questions. METHODS We explore the literature on potential NLP products/services applied to various medical/clinical/healthcare areas. RESULTS This paper introduces clinical applications (bedside applications), in which we introduce the use of NLP for each clinical department, internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology. Also, we clarify technical problems to be addressed for encouraging bedside applications based on NLP. CONCLUSIONS These results contribute to discussions regarding potentially feasible NLP applications and highlight research gaps for future studies.
Collapse
|
18
|
Monitoring Mentions of COVID-19 Vaccine Side Effects from Japanese and Indonesian Twitter: Infodemiological Study (Preprint). JMIR INFODEMIOLOGY 2022; 2:e39504. [PMID: 36277140 PMCID: PMC9578292 DOI: 10.2196/39504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/29/2022] [Accepted: 09/19/2022] [Indexed: 11/13/2022]
Abstract
Background The year 2021 was marked by vaccinations against COVID-19, which spurred wider discussion among the general population, with some in favor and some against vaccination. Twitter, a popular social media platform, was instrumental in providing information about the COVID-19 vaccine and has been effective in observing public reactions. We focused on tweets from Japan and Indonesia, 2 countries with a large Twitter-using population, where concerns about side effects were consistently stated as a strong reason for vaccine hesitancy. Objective This study aimed to investigate how Twitter was used to report vaccine-related side effects and to compare the mentions of these side effects from 2 messenger RNA (mRNA) vaccine types developed by Pfizer and Moderna, in Japan and Indonesia. Methods We obtained tweet data from Twitter using Japanese and Indonesian keywords related to COVID-19 vaccines and their side effects from January 1, 2021, to December 31, 2021. We then removed users with a high frequency of tweets and merged the tweets from multiple users as a single sentence to focus on user-level analysis, resulting in a total of 214,165 users (Japan) and 12,289 users (Indonesia). Then, we filtered the data to select tweets mentioning Pfizer or Moderna only and removed tweets mentioning both. We compared the side effect counts to the public reports released by Pfizer and Moderna. Afterward, logistic regression models were used to compare the side effects for the Pfizer and Moderna vaccines for each country. Results We observed some differences in the ratio of side effects between the public reports and tweets. Specifically, fever was mentioned much more frequently in tweets than would be expected based on the public reports. We also observed differences in side effects reported between Pfizer and Moderna vaccines from Japan and Indonesia, with more side effects reported for the Pfizer vaccine in Japanese tweets and more side effects with the Moderna vaccine reported in Indonesian tweets. Conclusions We note the possible consequences of vaccine side effect surveillance on Twitter and information dissemination, in that fever appears to be over-represented. This could be due to fever possibly having a higher severity or measurability, and further implications are discussed.
Collapse
|
19
|
Identification of hand-foot syndrome from cancer patients' blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms. PLoS One 2022; 17:e0267901. [PMID: 35507636 PMCID: PMC9067685 DOI: 10.1371/journal.pone.0267901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 04/18/2022] [Indexed: 12/29/2022] Open
Abstract
Early detection and management of adverse drug reactions (ADRs) is crucial for improving patients' quality of life. Hand-foot syndrome (HFS) is one of the most problematic ADRs for cancer patients. Recently, an increasing number of patients post their daily experiences to internet community, for example in blogs, where potential ADR signals not captured through routine clinic visits can be described. Therefore, this study aimed to identify patients with potential ADRs, focusing on HFS, from internet blogs by using natural language processing (NLP) deep-learning methods. From 10,646 blog posts, written in Japanese by cancer patients, 149 HFS-positive sentences were extracted after pre-processing, annotation and scrutiny by a certified oncology pharmacist. The HFS-positive sentences described not only HFS typical expressions like "pain" or "spoon nail", but also patient-derived unique expressions like onomatopoeic ones. The dataset was divided at a 4 to 1 ratio and used to train and evaluate three NLP deep-learning models: long short-term memory (LSTM), bidirectional LSTM and bidirectional encoder representations from transformers (BERT). The BERT model gave the best performance with precision 0.63, recall 0.82 and f1 score 0.71 in the HFS user identification task. Our results demonstrate that this NLP deep-learning model can successfully identify patients with potential HFS from blog posts, where patients' real wordings on symptoms or impacts on their daily lives are described. Thus, it should be feasible to utilize patient-generated text data to improve ADR management for individual patients.
Collapse
|
20
|
Extracting Multiple Worries from Breast Cancer Patient Blogs Using Multi-Label Classification with a Natural Language-Processing Model BERT (Bidirectional Encoder Representations from Transformers): Infodemiology Study of Blogs (Preprint). JMIR Cancer 2022; 8:e37840. [PMID: 35657664 PMCID: PMC9206207 DOI: 10.2196/37840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 05/10/2022] [Accepted: 05/23/2022] [Indexed: 12/26/2022] Open
Abstract
Background Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. Objective This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. Methods A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. Results Among the blog posts, 477 included “treatment,” 1138 included “physical,” 673 included “psychological,” 312 included “work/financial,” and 283 included “family/friends.” The interannotator agreement values were 0.67 for “treatment,” 0.76 for “physical,” 0.56 for “psychological,” 0.73 for “work/financial,” and 0.73 for “family/friends,” indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. Conclusions This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support.
Collapse
|
21
|
Exploring Relationships Between Tweet Numbers and Over-the-counter Drug Sales for Allergic Rhinitis: Retrospective Analysis. JMIR Form Res 2022; 6:e33941. [PMID: 35107434 PMCID: PMC8851323 DOI: 10.2196/33941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/19/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Health-related social media data are increasingly being used in disease surveillance studies. In particular, surveillance of infectious diseases such as influenza has demonstrated high correlations between the number of social media posts mentioning the disease and the number of patients who went to the hospital and were diagnosed with the disease. However, the prevalence of some diseases, such as allergic rhinitis, cannot be estimated based on the number of patients alone. Specifically, individuals with allergic rhinitis typically self-medicate by taking over-the-counter (OTC) medications without going to the hospital. Although allergic rhinitis is not a life-threatening disease, it represents a major social problem because it reduces people's quality of life, making it essential to understand its prevalence and people's motives for self-medication behavior. OBJECTIVE This study aims to explore the relationship between the number of social media posts mentioning the main symptoms of allergic rhinitis and the sales volume of OTC rhinitis medications in Japan. METHODS We collected tweets over 4 years (from 2017 to 2020) that included keywords corresponding to the main nasal symptoms of allergic rhinitis: "sneezing," "runny nose," and "stuffy nose." We also obtained the sales volume of OTC drugs, including oral medications and nasal sprays, for the same period. We then calculated the Pearson correlation coefficient between time series data on the number of tweets per week and time series data on the sales volume of OTC drugs per week. RESULTS The results showed a much higher correlation (r=0.8432) between the time series data on the number of tweets mentioning "stuffy nose" and the time series data on the sales volume of nasal sprays than for the other two symptoms. There was also a high correlation (r=0.9317) between the seasonal components of these time series data. CONCLUSIONS We investigated the relationships between social media data and behavioral patterns, such as OTC drug sales volume. Exploring these relationships can help us understand the prevalence of allergic rhinitis and the motives for self-care treatment using social media data, which would be useful as a marketing indicator to reduce the number of out-of-stocks in stores, provide (sell) rhinitis medicines to consumers in a stable manner, and reduce the loss of sales opportunities. In the future, in-depth investigations are required to estimate sales volume using social media data, and future research could investigate other diseases and countries.
Collapse
|
22
|
A clinical specific BERT developed using a huge Japanese clinical text corpus. PLoS One 2021; 16:e0259763. [PMID: 34752490 PMCID: PMC8577751 DOI: 10.1371/journal.pone.0259763] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 10/26/2021] [Indexed: 11/19/2022] Open
Abstract
Generalized language models that are pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate the development of a clinical specific BERT model with a huge amount of Japanese clinical text and evaluate it on the NTCIR-13 MedWeb that has fake Twitter messages regarding medical concerns with eight labels. Approximately 120 million clinical texts stored at the University of Tokyo Hospital were used as our dataset. The BERT-base was pre-trained using the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked-LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT did not show significantly higher performance on the MedWeb task than the other BERT models that were pre-trained with Japanese Wikipedia text. The advantage of pre-training on clinical text may become apparent in more complex tasks on actual clinical text, and such an evaluation set needs to be developed.
Collapse
|
23
|
Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach. JMIR Cancer 2021; 7:e32005. [PMID: 34709187 PMCID: PMC8587180 DOI: 10.2196/32005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/25/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND A large number of patient narratives are available on various web services. As for web question and answer services, patient questions often relate to medical needs, and we expect these questions to provide clues for a better understanding of patients' medical needs. OBJECTIVE This study aimed to extract patients' needs and classify them into thematic categories. Clarifying patient needs is the first step in solving social issues that patients with cancer encounter. METHODS For this study, we used patient question texts containing the key phrase "breast cancer," available at the Yahoo! Japan question and answer service, Yahoo! Chiebukuro, which contains over 60,000 questions on cancer. First, we converted the question text into a vector representation. Next, the relevance between patient needs and existing cancer needs categories was calculated based on cosine similarity. RESULTS The proportion of correct classifications in our proposed method was approximately 70%. Considering the results of classifying questions, we found the variation and the number of needs. CONCLUSIONS We created 3 corpora to classify the problems of patients with cancer. The proposed method was able to classify the problems considering the question text. Moreover, as an application example, the question text that included the side effect signaling of drugs and the unmet needs of cancer patients could be extracted. Revealing these needs is important to fulfill the medical needs of patients with cancer.
Collapse
|
24
|
Estimation of Psychological Distress in Japanese Youth Through Narrative Writing: Text-Based Stylometric and Sentiment Analyses. JMIR Form Res 2021; 5:e29500. [PMID: 34387556 PMCID: PMC8391726 DOI: 10.2196/29500] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 06/29/2021] [Accepted: 07/06/2021] [Indexed: 11/13/2022] Open
Abstract
Background Internalizing mental illnesses associated with psychological distress are often underdetected. Text-based detection using natural language processing (NLP) methods is increasingly being used to complement conventional detection efforts. However, these approaches often rely on self-disclosure through autobiographical narratives that may not always be possible, especially in the context of the collectivistic Japanese culture. Objective We propose the use of narrative writing as an alternative resource for mental illness detection in youth. Accordingly, in this study, we investigated the textual characteristics of narratives written by youth with psychological distress; our research focuses on the detection of psychopathological tendencies in written imaginative narratives. Methods Using NLP tools such as stylometric measures and lexicon-based sentiment analysis, we examined short narratives from 52 Japanese youth (mean age 19.8 years, SD 3.1) obtained through crowdsourcing. Participants wrote a short narrative introduction to an imagined story before completing a questionnaire to quantify their tendencies toward psychological distress. Based on this score, participants were categorized into higher distress and lower distress groups. The written narratives were then analyzed using NLP tools and examined for between-group differences. Although outside the scope of this study, we also carried out a supplementary analysis of narratives written by adults using the same procedure. Results Youth demonstrating higher tendencies toward psychological distress used significantly more positive (happiness-related) words, revealing differences in valence of the narrative content. No other significant differences were observed between the high and low distress groups. Conclusions Youth with tendencies toward mental illness were found to write more positive stories that contained more happiness-related terms. These results may potentially have widespread implications on psychological distress screening on online platforms, particularly in cultures such as Japan that are not accustomed to self-disclosure. Although the mechanisms that we propose in explaining our results are speculative, we believe that this interpretation paves the way for future research in online surveillance and detection efforts.
Collapse
|
25
|
Measuring Public Concern About COVID-19 in Japanese Internet Users Through Search Queries: Infodemiological Study. JMIR Public Health Surveill 2021; 7:e29865. [PMID: 34174781 PMCID: PMC8294121 DOI: 10.2196/29865] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/01/2021] [Accepted: 06/13/2021] [Indexed: 01/19/2023] Open
Abstract
Background COVID-19 has disrupted lives and livelihoods and caused widespread panic worldwide. Emerging reports suggest that people living in rural areas in some countries are more susceptible to COVID-19. However, there is a lack of quantitative evidence that can shed light on whether residents of rural areas are more concerned about COVID-19 than residents of urban areas. Objective This infodemiology study investigated attitudes toward COVID-19 in different Japanese prefectures by aggregating and analyzing Yahoo! JAPAN search queries. Methods We measured COVID-19 concerns in each Japanese prefecture by aggregating search counts of COVID-19–related queries of Yahoo! JAPAN users and data related to COVID-19 cases. We then defined two indices—the localized concern index (LCI) and localized concern index by patient percentage (LCIPP)—to quantitatively represent the degree of concern. To investigate the impact of emergency declarations on people's concerns, we divided our study period into three phases according to the timing of the state of emergency in Japan: before, during, and after. In addition, we evaluated the relationship between the LCI and LCIPP in different prefectures by correlating them with prefecture-level indicators of urbanization. Results Our results demonstrated that the concerns about COVID-19 in the prefectures changed in accordance with the declaration of the state of emergency. The correlation analyses also indicated that the differentiated types of public concern measured by the LCI and LCIPP reflect the prefectures’ level of urbanization to a certain extent (ie, the LCI appears to be more suitable for quantifying COVID-19 concern in urban areas, while the LCIPP seems to be more appropriate for rural areas). Conclusions We quantitatively defined Japanese Yahoo users’ concerns about COVID-19 by using the search counts of COVID-19–related search queries. Our results also showed that the LCI and LCIPP have external validity.
Collapse
|
26
|
Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT. Methods Inf Med 2021; 60:e56-e64. [PMID: 34237783 PMCID: PMC8294940 DOI: 10.1055/s-0041-1731390] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022]
Abstract
BACKGROUND Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. OBJECTIVE The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. MATERIALS We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. METHODS We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. RESULTS The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.
Collapse
|
27
|
Abstract
Fake news can have a significant negative impact on society because of the growing use of mobile devices and the worldwide increase in Internet access. It is therefore essential to develop a simple mathematical model to understand the online dissemination of fake news. In this study, we propose a point process model of the spread of fake news on Twitter. The proposed model describes the spread of a fake news item as a two-stage process: initially, fake news spreads as a piece of ordinary news; then, when most users start recognizing the falsity of the news item, that itself spreads as another news story. We validate this model using two datasets of fake news items spread on Twitter. We show that the proposed model is superior to the current state-of-the-art methods in accurately predicting the evolution of the spread of a fake news item. Moreover, a text analysis suggests that our model appropriately infers the correction time, i.e., the moment when Twitter users start realizing the falsity of the news item. The proposed model contributes to understanding the dynamics of the spread of fake news on social media. Its ability to extract a compact representation of the spreading pattern could be useful in the detection and mitigation of fake news.
Collapse
|
28
|
Identification of Adverse Drug Event-Related Japanese Articles: Natural Language Processing Analysis. JMIR Med Inform 2020; 8:e22661. [PMID: 33245290 PMCID: PMC7732716 DOI: 10.2196/22661] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/05/2020] [Accepted: 10/28/2020] [Indexed: 12/23/2022] Open
Abstract
Background Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. Objective Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. Methods Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. Results Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. Conclusions A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.
Collapse
|
29
|
Surveillance of early stage COVID-19 clusters using search query logs and mobile device-based location information. Sci Rep 2020; 10:18680. [PMID: 33122686 PMCID: PMC7596075 DOI: 10.1038/s41598-020-75771-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/01/2020] [Indexed: 12/18/2022] Open
Abstract
Two clusters of the coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan, in February 2020. To identify these clusters, this study employed web search query logs of multiple devices and user location information from location-aware mobile devices. We anonymously identified users who used a web search engine (i.e., Yahoo! JAPAN) to search for COVID-19 or its symptoms. We regarded them as web searchers who were suspicious of their own COVID-19 infection (WSSCI). We extracted the location of WSSCI via a mobile operating system application and compared the spatio-temporal distribution of WSSCI with the actual location of the two known clusters. In the early stage of cluster development, we confirmed several WSSCI. Our approach was accurate in this stage and became biased after a public announcement of the cluster development. When other cluster-related resources, such as detailed population statistics, are not available, the proposed metric can capture hints of emerging clusters.
Collapse
|
30
|
Robust two-stage influenza prediction model considering regular and irregular trends. PLoS One 2020; 15:e0233126. [PMID: 32437380 PMCID: PMC7241782 DOI: 10.1371/journal.pone.0233126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/28/2020] [Indexed: 11/18/2022] Open
Abstract
Influenza causes numerous deaths worldwide every year. Predicting the number of influenza patients is an important task for medical institutions. Two types of data regarding influenza-like illnesses (ILIs) are often used for flu prediction: (1) historical data and (2) user generated content (UGC) data on the web such as search queries and tweets. Historical data have an advantage against the normal state but show disadvantages against irregular phenomena. In contrast, UGC data are advantageous for irregular phenomena. So far, no effective model providing the benefits of both types of data has been devised. This study proposes a novel model, designated the two-stage model, which combines both historical and UGC data. The basic idea is, first, basic regular trends are estimated using the historical data-based model, and then, irregular trends are predicted by the UGC data-based model. Our approach is practically useful because we can train models separately. Thus, if a UGC provider changes the service, our model could produce better performance because the first part of the model is still stable. Experiments on the US and Japan datasets demonstrated the basic feasibility of the proposed approach. In the dropout (pseudo-noise) test that assumes a UGC service would change, the proposed method also showed robustness against outliers. The proposed model is suitable for prediction of seasonal flu.
Collapse
|
31
|
Comparing Medical Term Usage Patterns of Professionals and Search Engine and Community Question Answering Service Users in Japan: Log Analysis. J Med Internet Res 2020; 22:e13369. [PMID: 32281938 PMCID: PMC7186863 DOI: 10.2196/13369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 11/12/2019] [Accepted: 02/04/2020] [Indexed: 11/25/2022] Open
Abstract
Background Despite increasing opportunities for acquiring health information online, discussion of the specific words used in searches has been limited. Objective The aim of this study was to clarify the medical information gap between medical professionals and the general public in Japan through health information–seeking activities on the internet. Methods Search and posting data were analyzed from one of the most popular domestic search engines in Japan (Yahoo! JAPAN Search) and the most popular Japanese community question answering service (Yahoo! Chiebukuro). We compared the frequency of 100 clinical words appearing in the clinical case reports of medical professionals (clinical frequency) with their frequency in Yahoo! JAPAN Search (search frequency) logs and questions posted to Yahoo! Chiebukuro (question frequency). The Spearman correlation coefficient was used to quantify association patterns among the three information sources. Additionally, user information (gender and age) in the search frequency associated with each registered user was extracted. Results Significant correlations were observed between clinical and search frequencies (r=0.29, P=.003), clinical and question frequencies (r=0.34, P=.001), and search and question frequencies (r=0.57, P<.001). Low-frequency words in clinical frequency (eg, “hypothyroidism,” “ulcerative colitis”) highly ranked in search frequency. Similarly, “pain,” “slight fever,” and “numbness” were highly ranked only in question frequency. The weighted average of ages was 34.5 (SD 2.7) years, and the weighted average of gender (man –1, woman +1) was 0.1 (SD 0.1) in search frequency. Some words were specifically extracted from the search frequency of certain age groups, including “abdominal pain” (10-20 years), “plasma cells” and “inflammatory findings” (20-30 years), “DM” (diabetes mellitus; 30-40 years), “abnormal shadow” and “inflammatory findings” (40-50 years), “hypertension” and “abnormal shadow” (50-60 years), and “lung cancer” and “gastric cancer” (60-70 years). Conclusions Search and question frequencies showed similar tendencies, whereas search and clinical frequencies showed discrepancy. Low-clinical frequency words related to diseases such as “hypothyroidism” and “ulcerative colitis” had high search frequencies, whereas those related to symptoms such as “pain,” “slight fever,” and “numbness” had high question frequencies. Moreover, high search frequency words included designated intractable diseases such as “ulcerative colitis,” which has an incidence of less than 0.1% in the Japanese population. Therefore, it is generally worthwhile to pay attention not only to major diseases but also to minor diseases that users frequently seek information on, and more words will need to be analyzed in the future. Some characteristic words for certain age groups were observed (eg, 20-40 years: “cancer”; 40-60 years: diagnoses and diseases identified in health examinations; 60-70 years: diseases with late adulthood onset and “death”). Overall, this analysis demonstrates that medical professionals as information providers should be aware of clinical frequency, and medical information gaps between professionals and the general public should be bridged.
Collapse
|
32
|
A survey of clarithromycin monotherapy and long-term administration of ethambutol for patients with MAC lung disease in Japan: A retrospective cohort study using the database of health insurance claims. Pharmacoepidemiol Drug Saf 2019; 29:427-432. [PMID: 31876044 DOI: 10.1002/pds.4951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 09/01/2019] [Accepted: 12/08/2019] [Indexed: 11/10/2022]
Abstract
BACKGROUND The number of patients with nontuberculous mycobacteriosis (NTM) has increased exponentially in recent years. In Japan, approximately 88.8% of patients with NTM suffer from Mycobacterium avium-intracellulare complex (MAC) lung disease. Incidence of MAC lung disease is increasing in particularly among the middle-aged and elderly women owing to a rapid increase in nontuberculous mycobacterial infections. General treatment for MAC lung disease is chemotherapy. The type of chemotherapy recommended by specialists to prevent the development of a drug-resistant strain of the bacteria consists of a combination of clarithromycin (CAM), rifampicin, and ethambutol (EB). CAM monotherapy is contraindicated by specialists owing to its high potential to induce drug-resistant bacterial strains in patients with MAC lung disease. In addition, administering EB at doses not less than 1000 mg d-1 is not recommended to avoid adverse drug reactions. However, it is unclear how much such treatment cases exist in real world clinical settings. This is because no long-term investigation has been carried out. MATERIALS AND METHODS This study investigated treatment with these drugs from 2005 to 2017, by studying 1135 patients with MAC lung disease based on health insurance claims database. RESULTS Results showed that approximately 9.2% (101 cases) were prescribed long-term CAM monotherapy for 3 months or longer and approximately 3.6% (18 cases) were prescribed high doses of EB. CONCLUSION CAM monotherapy over a long period of time is potentially detrimental to some patients. Better awareness of the types of treatments and their potential negative effects will be beneficial to clinical practitioners.
Collapse
|
33
|
Quick Cognitive Impairment Test for Cancer Patients Using Emotional Stroop Effect. Stud Health Technol Inform 2019; 264:1629-1630. [PMID: 31438264 DOI: 10.3233/shti190568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent studies have attributed impaired cognitive function in cancer patients, or Cancer Related Cognitive Impairment (CRCI), to various causes. CRCI screening is vital for guiding important decisions about treatment options. This study investigates the emotional Stroop-test-based CRCI screening, examining response time when naming the colors of negative emotional words. Cancer patients (n=17) participated in two tests: (1) the Stroop task; (2) State-Trait Anxiety Inventory. Results suggest that Stroop-based CRCI screening is feasible.
Collapse
|
34
|
KOTOBAKARI Study: Using Natural Language Processing of Patient Short Narratives to Detect Cancer Related Cognitive Impairment. Stud Health Technol Inform 2019; 264:1111-1115. [PMID: 31438097 DOI: 10.3233/shti190398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
BACKGROUND Recent reports of some studies have described that the cognitive function of cancer patients often declines by a phenomenon designated as cancer related cognitive impairment (CRCI). For patients' decision-making, detecting CRCI is important. To do so, this study uses language-based CRCI screening to examine participants' language ability. OBJECTIVE This study was conducted to ascertain whether a Natural Language Processing (NLP) based system can detect CRCI, or not. MATERIALS AND METHODS We obtained materials of two types from cancer patients (n = 116): (1) speech samples on three topics, and (2) cognitive function level test scores from Hasegawa's Dementia Scale - Revised (HDS-R), a test used in Japan for dementia patients. The test is similar to the Mini-Mental State Examination. RESULTS AND DISCUSSION Cancer patients with lower HDS-R scores showed a significantly lower Type Token Ratio (TTR). CONCLUSION This result demonstrates the feasibility of the proposed speech-language-based CRCI screening method.
Collapse
|
35
|
Clinical Characteristics of Heart Failure from Case Reports Presented at the Regional Meeting of the Japanese Society of Internal Medicine. Intern Med 2019; 58:2145-2150. [PMID: 31178494 PMCID: PMC6709326 DOI: 10.2169/internalmedicine.2583-18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Objective To examine case reports presented at the Regional Meeting of the Japanese Society of Internal Medicine in order to clarify the underlying disease and prognosis of heart failure, which is often caused by non-cardiovascular diseases. Methods We examined 49,693 case reports from the Japanese Society of Internal Medicine database. A total of 2,893 reports were included after excluding 46,022 reports that did not include the term "heart failure" and 778 reports with no indications of symptoms of heart failure. We assessed each patient's basal disease, and according to the abstracts, we reported their prognosis as dead or alive. Results Of the 2,893 reports included, 1,952 (67.5%) and 941 (32.5%) had cardiovascular and non-cardiovascular diseases as the causes, respectively; these cases were attributed to 725 different diseases, 196 (27.0%) and 529 (73.0%) of which were cardiovascular and non-cardiovascular diseases, respectively. In addition, 91 different side effects were identified. The percentage of cases of heart failure-related mortality was significantly higher among the patients with non-cardiovascular diseases than in those with cardiovascular diseases (17.8% vs. 10.8%; p <0.001). Of the diseases reported as causes of heart failure in more than 10 reports, pulmonary tumor thrombotic microangiopathy (87%), multiple myeloma (50%), and amyloidosis (47%) accounted for the highest percentages of heart failure-related mortality. Conclusion Because heart failure is often caused by non-cardiovascular diseases, a broad study of case reports on internal medicine is important for cardiologists.
Collapse
|
36
|
Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations. J Med Internet Res 2019; 21:e12783. [PMID: 30785407 PMCID: PMC6401666 DOI: 10.2196/12783] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/12/2018] [Accepted: 12/13/2018] [Indexed: 11/13/2022] Open
Abstract
Background The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. Objective This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. Methods In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. Results The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. Conclusions This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
Collapse
|
37
|
Causal Relationships Among Pollen Counts, Tweet Numbers, and Patient Numbers for Seasonal Allergic Rhinitis Surveillance: Retrospective Analysis. J Med Internet Res 2019; 21:e10450. [PMID: 30785411 PMCID: PMC6401667 DOI: 10.2196/10450] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 11/08/2018] [Accepted: 12/10/2018] [Indexed: 12/29/2022] Open
Abstract
Background Health-related social media data are increasingly used in disease-surveillance studies, which have demonstrated moderately high correlations between the number of social media posts and the number of patients. However, there is a need to understand the causal relationship between the behavior of social media users and the actual number of patients in order to increase the credibility of disease surveillance based on social media data. Objective This study aimed to clarify the causal relationships among pollen count, the posting behavior of social media users, and the number of patients with seasonal allergic rhinitis in the real world. Methods This analysis was conducted using datasets of pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis from Kanagawa Prefecture, Japan. We examined daily pollen counts for Japanese cedar (the major cause of seasonal allergic rhinitis in Japan) and hinoki cypress (which commonly complicates seasonal allergic rhinitis) from February 1 to May 31, 2017. The daily numbers of tweets that included the keyword “kafunshō” (or seasonal allergic rhinitis) were calculated between January 1 and May 31, 2017. Daily numbers of patients with seasonal allergic rhinitis from January 1 to May 31, 2017, were obtained from three healthcare institutes that participated in the study. The Granger causality test was used to examine the causal relationships among pollen count, tweet numbers, and the number of patients with seasonal allergic rhinitis from February to May 2017. To determine if time-variant factors affect these causal relationships, we analyzed the main seasonal allergic rhinitis phase (February to April) when Japanese cedar trees actively produce and release pollen. Results Increases in pollen count were found to increase the number of tweets during the overall study period (P=.04), but not the main seasonal allergic rhinitis phase (P=.05). In contrast, increases in pollen count were found to increase patient numbers in both the study period (P=.04) and the main seasonal allergic rhinitis phase (P=.01). Increases in the number of tweets increased the patient numbers during the main seasonal allergic rhinitis phase (P=.02), but not the overall study period (P=.89). Patient numbers did not affect the number of tweets in both the overall study period (P=.24) and the main seasonal allergic rhinitis phase (P=.47). Conclusions Understanding the causal relationships among pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis is an important step to increasing the credibility of surveillance systems that use social media data. Further in-depth studies are needed to identify the determinants of social media posts described in this exploratory analysis.
Collapse
|
38
|
Idea density in Japanese for the early detection of dementia based on narrative speech. PLoS One 2018; 13:e0208418. [PMID: 30517200 PMCID: PMC6281229 DOI: 10.1371/journal.pone.0208418] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 11/13/2018] [Indexed: 11/19/2022] Open
Abstract
Background Idea density (ID), a natural language processing–based index, was developed to aid in the detection of dementia through the analysis of English narratives. However, it has not been applied to non-English languages due to the difficulties in translating grammatical concepts. In this study, we defined rules to count ideas in Japanese narratives based on a previous study and proposed a novel method to estimate ID in Japanese text using machine translation. Materials The study participants comprised 42 Japanese patients with dementia aged 69–98 years (mean: 84.95 years). We collected free narratives from the participants to build a speech corpus. The narratives of the patients were translated into English using three machine translation systems: Google Translate, Bing Translator, and Excite Translator. The ID in the translated text was then calculated using the Dependency-based Propositional ID (DEPID), an English ID scoring tool. Results The maximum correlation coefficient between ID calculated using DEPID-R-ADD (a modified DEPID method to calculate ID after removing vague sentences) and the Mini-Mental State Examination score was 0.473, indicating a moderate correlation. Discussion The results demonstrate the feasibility of machine translation-based ID measurement. We believe that the basic concept of this translation approach can be applied to other non-English languages.
Collapse
|
39
|
Extraction and Standardization of Patient Complaints from Electronic Medication Histories for Pharmacovigilance: Natural Language Processing Analysis in Japanese. JMIR Med Inform 2018; 6:e11021. [PMID: 30262450 PMCID: PMC6231790 DOI: 10.2196/11021] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 08/07/2018] [Accepted: 08/25/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Despite the growing number of studies using natural language processing for pharmacovigilance, there are few reports on manipulating free text patient information in Japanese. OBJECTIVE This study aimed to establish a method of extracting and standardizing patient complaints from electronic medication histories accumulated in a Japanese community pharmacy for the detection of possible adverse drug event (ADE) signals. METHODS Subjective information included in electronic medication history data provided by a Japanese pharmacy operating in Hiroshima, Japan from September 1, 2015 to August 31, 2016, was used as patients' complaints. We formulated search rules based on morphological analysis and daily (nonmedical) speech and developed a system that automatically executes the search rules and annotates free text data with International Classification of Diseases, Tenth Revision (ICD-10) codes. The performance of the system was evaluated through comparisons with data manually annotated by health care workers for a data set of 5000 complaints. RESULTS Of 5000 complaints, the system annotated 2236 complaints with ICD-10 codes, whereas health care workers annotated 2348 statements. There was a match in the annotation of 1480 complaints between the system and manual work. System performance was .66 regarding precision, .63 in recall, and .65 for the F-measure. CONCLUSIONS Our results suggest that the system may be helpful in extracting and standardizing patients' speech related to symptoms from massive amounts of free text data, replacing manual work. After improving the extraction accuracy, we expect to utilize this system to detect signals of possible ADEs from patients' complaints in the future.
Collapse
|
40
|
Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Health Surveill 2018; 4:e65. [PMID: 30274968 PMCID: PMC6231889 DOI: 10.2196/publichealth.8627] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 02/24/2018] [Accepted: 07/18/2018] [Indexed: 11/13/2022] Open
Abstract
Background The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share the following logic: they incorporate SNS users as social sensors. These social sensor–based approaches also share a common problem: SNS-based surveillance are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. Objective This study proposes a novel approach to estimate the trend of patient numbers using indirect information covering both urban areas and rural areas within the posts. Methods We presented a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning 3 years (7 million influenza-related tweets in Japanese) was used to evaluate the model. Both direct information and indirect information that mention other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not used directly and were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. Results The estimation performance of our approach was evaluated using the correlation coefficient between the number of influenza cases as the gold standard values and the estimated values by the proposed models. The results revealed that the baseline model (BASELINE+NLP) shows .36 and that the proposed model (TRAP+NLP) improved the accuracy (.70, +.34 points). Conclusions The proposed approach by which the indirect information inhibits direct information exhibited improved estimation performance not only in rural cities but also in urban cities, which demonstrated the effectiveness of the proposed method consisting of a TRAP model and natural language processing (NLP) classification.
Collapse
|
41
|
Crowdsourced Identification of Possible Allergy-Associated Factors: Automated Hypothesis Generation and Validation Using Crowdsourcing Services. JMIR Res Protoc 2017; 6:e83. [PMID: 28512079 PMCID: PMC5449648 DOI: 10.2196/resprot.5851] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Revised: 10/31/2016] [Accepted: 03/23/2017] [Indexed: 12/31/2022] Open
Abstract
Background Hypothesis generation is an essential task for clinical research, and it can require years of research experience to formulate a meaningful hypothesis. Recent studies have endeavored to apply crowdsourcing to generate novel hypotheses for research. In this study, we apply crowdsourcing to explore previously unknown allergy-associated factors. Objective In this study, we aimed to collect and test hypotheses of unknown allergy-associated factors using a crowdsourcing service. Methods Using a series of questionnaires, we asked crowdsourcing participants to provide hypotheses on associated factors for seven different allergies, and validated the candidate hypotheses with odds ratios calculated for each associated factor. We repeated this abductive validation process to identify a set of reliable hypotheses. Results We obtained two primary findings: (1) crowdsourcing showed that 8 of the 13 known hypothesized allergy risks were statically significant; and (2) among the total of 157 hypotheses generated by the crowdsourcing service, 75 hypotheses were statistically significant allergy-associated factors, comprising the 8 known risks and 53 previously unknown allergy-associated factors. These findings suggest that there are still many topics to be examined in future allergy studies. Conclusions Crowdsourcing generated new hypotheses on allergy-associated factors. In the near future, clinical trials should be conducted to validate the hypotheses generated in this study.
Collapse
|
42
|
MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts. Stud Health Technol Inform 2017; 245:285-288. [PMID: 29295100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Because of recent replacement of physical documents with electronic medical records (EMR), the importance of information processing in the medical field has increased. In light of this trend, we have been developing MedEx/J, which retrieves important Japanese language information from medical reports. MedEx/J executes two tasks simultaneously: (1) term extraction, and (2) positive and negative event classification. We designate this approach as a one-scan approach, providing simplicity of systems and reasonable accuracy. MedEx/J performance on the two tasks is described herein: (1) term extraction (F<inf>β = 1</inf> = 0.87) and (2) positive-negative classification (F<inf>β = 1</inf> = 0.63). This paper also presents discussion and explains remaining issues in the medical natural language processing field.
Collapse
|
43
|
Conditional Density Estimation of Tweet Location: A Feature-Dependent Approach. Stud Health Technol Inform 2017; 245:408-411. [PMID: 29295126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Twitter-based public health surveillance systems have achieved many successes. Underlying this success, much useful information has been associated with tweets such as temporal and spatial information. For fine-grained investigation of disease propagation, this information is attributed a more important role. Unlike temporal information that is always available, spatial information is less available because of privacy concerns. To extend the availability of spatial information, many geographic identification systems have been developed. However, almost no origin of the user location can be identified, even if a human reads the tweet contents. This study estimates the geographic origin of tweets with reliability using a density estimation approach. Our method reveals how the model interprets the origin of user location according to the spread of estimated density.
Collapse
|
44
|
Vocabulary Size in Speech May Be an Early Indicator of Cognitive Impairment. PLoS One 2016; 11:e0155195. [PMID: 27176919 PMCID: PMC4866705 DOI: 10.1371/journal.pone.0155195] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 04/25/2016] [Indexed: 11/18/2022] Open
Abstract
Little is known about the relationship between mild cognitive impairment (MCI) and changes to language abilities. Here, we used the revised Hasegawa Dementia Scale (HDS-R) to identify suspected MCI in elderly individuals. We then analyzed written and spoken narratives to compare the language abilities between study participants with and without MCI in order to explore the relationship between cognitive and language abilities, and to identify a possible indicator for the early detection of MCI and dementia. We recruited 22 people aged 74 to 86 years (mean: 78.32 years; standard deviation: 3.36). The participants were requested to write and talk about one of the happiest events in their lives. Based on HDS-R scores, we divided the participants into 2 groups: the MCI Group comprised 8 participants with a score of 26 or lower, while the Healthy Group comprised 14 participants with a score of 27 or higher. The transcriptions of both written and spoken samples for each participant were used in the measurement of NLP-based language ability scores. Our analysis showed no significant differences in writing abilities between the 2 groups in any of the language ability scores. However, analysis of the spoken narrative showed that the MCI Group had a significantly larger vocabulary size. In addition, analysis of a metric that signified the gap in content between the spoken and written narratives also revealed a larger vocabulary size in the MCI Group. Individuals with early-stage MCI may be engaging in behavior to conceal their deteriorating cognition, thereby leading to a temporary increase in their active spoken vocabulary. These results indicate the possible detection of early stages of reduced cognition before dementia onset through the analysis of spoken narratives.
Collapse
|
45
|
Understanding the Relationship between Social Cognition and Word Difficulty. A Language Based Analysis of Individuals with Autism Spectrum Disorder. Methods Inf Med 2015; 54:522-9. [PMID: 26391807 DOI: 10.3414/me15-01-0038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Accepted: 06/11/2015] [Indexed: 11/09/2022]
Abstract
BACKGROUND Few quantitative studies have been conducted on the relationship between society and its languages. Individuals with autistic spectrum disorder (ASD) are known to experience social hardships, and a wide range of clinical information about their quality of life has been provided through numerous narrative analyses. However, the narratives of ASD patients have thus far been examined mainly through qualitative approaches. OBJECTIVES In this study, we analyzed adults with ASD to quantitatively examine the relationship between language abilities and ASD severity scores. METHODS We generated phonetic transcriptions of speeches by 16 ASD adults at an ASD workshop, and divided the participants into 2 groups according to their Social Responsiveness Scale(TM), 2nd Edition (SRS(TM)-2) scores (where higher scores represent more severe ASD): Group A comprised high-scoring ASD adults (SRS(TM)-2 score: ≥ 76) and Group B comprised low- and intermediate-scoring ASD adults (SRS(TM)-2 score: < 76). Using natural language processing (NLP)-based analytical methods, the narratives were converted into numerical data according to four language ability indicators, and the relationships between the language ability scores and ASD severity scores were compared. RESULTS AND DISCUSSION Group A showed a marginally negative correlation with the level of Japanese word difficulty (p < .10), while the "social cognition" subscale of the SRS(TM)-2 score showed a significantly negative correlation (p < .05) with word difficulty. When comparing only male participants, Group A demonstrated a significantly lower correlation with word difficulty level than Group B (p < .10). CONCLUSION Social communication was found to be strongly associated with the level of word difficulty in speech. The clinical applications of these findings may be available in the near future, and there is a need for further detailed study on language metrics designed for ASD adults.
Collapse
|
46
|
Blog Posting After Lung Cancer Notification: Content Analysis of Blogs Written by Patients or Their Families. JMIR Cancer 2015; 1:e5. [PMID: 28410169 DOI: 10.2196/cancer.3883] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 04/09/2015] [Accepted: 04/17/2015] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The advent and spread of the Internet has changed the way societies communicate. A portion of information on the Internet may constitute an important source of information concerning the experiences and thoughts of patients and their families. Patients and their families use blogs to obtain updated information, search for alternative treatments, facilitate communication with other patients, and receive emotional support. However, much of this information has yet to be actively utilized by health care professionals. OBJECTIVE We analyzed health-related information in blogs from Japan, focusing on the feelings and satisfaction levels of lung cancer patients or their family members after being notified of their disease. METHODS We collected 100 blogs written in Japanese by patients (or their families) who had been diagnosed with lung cancer by a physician. These 100 blogs posts were searchable between June 1 and June 30, 2013. We focused on blog posts that addressed the lung cancer notification event. We analyzed the data using two different approaches (Analysis A and Analysis B). Analysis A was blog content analysis in which we analyzed the content addressing the disease notification event in each blog. Analysis B was patient's dissatisfaction and anxiety analysis. Detailed blog content regarding patient's dissatisfaction and anxiety at the individual sentence level was coded and analyzed. RESULTS The 100 blog posts were written by 48 men, 46 women, and 6 persons whose sex was undisclosed. The average age of the blog authors was 52.4 years. With regard to cancer staging, there were 5 patients at Stage I, 3 patients at Stage II, 14 patients at Stage III, 21 patients at Stage IV, and 57 patients without a disclosed cancer stage. The results of Analysis A showed that the proportion of patients who were dissatisfied with the level of health care exceeded that of satisfied patients (22% vs 8%). From the 2499 sentences in the 100 blog posts analyzed, we identified expressions of dissatisfaction and anxiety in 495 sentences. Our results showed that there were substantially more posts concerning "Way of living, reasons for living, set of values" and "Relationships with medical staff (own hospital)" than in previous studies (Analysis B). CONCLUSIONS This study provides insight into the feelings of dissatisfaction and anxieties held by lung cancer patients and their families, including those regarding the "Way of living, reasons for living, set of values" and "Relationship with medical staff (own hospital)," which were inaccessible in previous survey analyses. When comparing information obtained from patients' voluntary records and those from previous surveys conducted by health care institutions, it is likely that the former would be more indicative of patients' actual opinions and feelings. Therefore, it is important to utilize such records as an information resource.
Collapse
|
47
|
Allergy Risk Finder: Hypothesis Generation System for Allergy Risks via Web Service. Stud Health Technol Inform 2015; 216:1113. [PMID: 26262412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This study's aim was to build a web service that automatically collects and tests hypotheses for possible allergy risks. We crowdsourced for unknown allergy risks, and obtained odds ratios. By using the collected hypotheses, we built a web service that estimates allergy risks from a questionnaire (consisting of 10 questions that we gathered from the crowdsourcing task), and at the end, we asked the users their new hypotheses on possible allergy risks. The web service also asked the users to send their original hypotheses to contribute to find the cause of allergy. In the near future, clinical trials to validate the hypotheses found in this study are desired.
Collapse
|
48
|
Mind the Gap: The Discrepancies between Patient Self-Reported Quality of Life and Medical Staff-Estimated Quality of Life. Stud Health Technol Inform 2015; 216:511-514. [PMID: 26262103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Information on patient quality of life (QOL) is essential to many clinical decisions. Therefore, studies that aim to extract QOL information from patient narratives are increasingly drawing attention. Also, several studies have noted that web services for patients, such as patient social networking services, may represent promising resources for QOL research. However, it is still unclear whether patient narrative text contains corresponding amounts of QOL information as self-reported QOL. This study investigates if medical staff can accurately estimate patient QOL from only patient narrative texts. We analyzed (1) QOL of cancer patients estimated by medical staff from patient autobiographical texts and (2) self-reported QOL scores of cancer patients. We compared patients from the following 3 disease groups: (1) gastrointestinal cancer, (2) breast cancer, and (3) lymphoma. The SF-36v2™ Health Survey was used to measure patient QOL in both materials, and the QOLs were compared. We found significant differences between self-reported QOL and estimated QOL in breast cancer patients and lymphoma patients, but not in gastrointestinal cancer patients. In particular, the medical staff tended to underestimate physical QOL scores. Medical staff may underestimate several aspects of QOL scores. On the basis of these results, we may be able to achieve more precise QOL estimation from patient narratives.
Collapse
|
49
|
Abstract
Purpose
– This aim of this paper is to elucidate rumor propagation on microblogs and to assess a system for collecting rumor information to prevent rumor-spreading.
Design/methodology/approach
– We present a case study of how rumors spread on Twitter during a recent disaster situation, the Great East Japan earthquake of March 11, 2011, based on comparison to a normal situation. We specifically examine rumor disaffirmation because automatic rumor extraction is difficult. Extracting rumor-disaffirmation is easier than extracting the rumors themselves. We classify tweets in disaster situations, analyze tweets in disaster situations based on users' impressions and compare the spread of rumor tweets in a disaster situation to that in a normal situation.
Findings
– The analysis results showed the following characteristics of rumors in a disaster situation. The information transmission is 74.9 per cent, representing the greatest number of tweets in our data set. Rumor tweets give users strong behavioral facilitation, make them feel negative and foment disorder. Rumors of a normal situation spread through many hierarchies but the rumors of disaster situations are two or three hierarchies, which means that the rumor spreading style differs in disaster situations and in normal situations.
Originality/value
– The originality of this paper is to target rumors on Twitter and to analyze rumor characteristics by multiple aspects using not only rumor-tweets but also disaffirmation-tweets as an investigation object.
Collapse
|
50
|
Abstract
Numerous diabetes-management systems and programs for improving glycemic control to meet guideline targets have been proposed, using IT technology. But all of them allow only limited-or no-real-time interaction between patients and the system in terms of system response to patient input; few studies have effectively assessed the systems' usability and feasibility to determine how well patients understand and can adopt the technology involved. DialBetics is composed of 4 modules: (1) data transmission module, (2) evaluation module, (3) communication module, and (4) dietary evaluation module. A 3-month randomized study was designed to assess the safety and usability of a remote health-data monitoring system, and especially its impact on modifying patient lifestyles to improve diabetes self-management and, thus, clinical outcomes. Fifty-four type 2 diabetes patients were randomly divided into 2 groups, 27 in the DialBetics group and 27 in the non-DialBetics control group. HbA1c and fasting blood sugar (FBS) values declined significantly in the DialBetics group: HbA1c decreased an average of 0.4% (from 7.1 ± 1.0% to 6.7 ± 0.7%) compared with an average increase of 0.1% in the non-DialBetics group (from 7.0 ± 0.9% to 7.1 ± 1.1%) (P = .015); The DialBetics group FBS decreased an average of 5.5 mg/dl compared with a non-DialBetics group average increase of 16.9 mg/dl (P = .019). BMI improvement-although not statistically significant because of the small sample size-was greater in the DialBetics group. DialBetics was shown to be a feasible and an effective tool for improving HbA1c by providing patients with real-time support based on their measurements and inputs.
Collapse
|