1
|
Kamba M, She WJ, Ferawati K, Wakamiya S, Aramaki E. Exploring the Impact of the COVID-19 Pandemic on Twitter in Japan: Qualitative Analysis of Disrupted Plans and Consequences. JMIR Infodemiology 2024; 4:e49699. [PMID: 38557446 PMCID: PMC10986681 DOI: 10.2196/49699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 08/11/2023] [Accepted: 03/06/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Despite being a pandemic, the impact of the spread of COVID-19 extends beyond public health, influencing areas such as the economy, education, work style, and social relationships. Research studies that document public opinions and estimate the long-term potential impact after the pandemic can be of value to the field. OBJECTIVE This study aims to uncover and track concerns in Japan throughout the COVID-19 pandemic by analyzing Japanese individuals' self-disclosure of disruptions to their life plans on social media. This approach offers alternative evidence for identifying concerns that may require further attention for individuals living in Japan. METHODS We extracted 300,778 tweets using the query phrase Corona-no-sei ("due to COVID-19," "because of COVID-19," or "considering COVID-19"), enabling us to identify the activities and life plans disrupted by the pandemic. The correlation between the number of tweets and COVID-19 cases was analyzed, along with an examination of frequently co-occurring words. RESULTS The top 20 nouns, verbs, and noun plus verb pairs co-occurring with Corona no-sei were extracted. The top 5 keywords were graduation ceremony, cancel, school, work, and event. The top 5 verbs were disappear, go, rest, can go, and end. Our findings indicate that education emerged as the top concern when the Japanese government announced the first state of emergency. We also observed a sudden surge in anxiety about material shortages such as toilet paper. As the pandemic persisted and more states of emergency were declared, we noticed a shift toward long-term concerns, including careers, social relationships, and education. CONCLUSIONS Our study incorporated machine learning techniques for disease monitoring through the use of tweet data, allowing the identification of underlying concerns (eg, disrupted education and work conditions) throughout the 3 stages of Japanese government emergency announcements. The comparison with COVID-19 case numbers provides valuable insights into the short- and long-term societal impacts, emphasizing the importance of considering citizens' perspectives in policy-making and supporting those affected by the pandemic, particularly in the context of Japanese government decision-making.
Collapse
Affiliation(s)
- Masaru Kamba
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Wan Jou She
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Kiki Ferawati
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoko Wakamiya
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Eiji Aramaki
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
2
|
Yao LFL, Liew K, Wakamiya S, Aramaki E. Extracting Spatio-Temporal Trends in Medical Research Prioritization Through Natural Language Processing of Case Report Abstracts. Stud Health Technol Inform 2024; 310:634-638. [PMID: 38269886 DOI: 10.3233/shti231042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Medical research prioritization is an important aspect of decision-making by researchers and relevant stakeholders. The ever-increasing availability of technology and data has opened doors to new discoveries and new questions. This makes it difficult for researchers and relevant stakeholders to make well-informed decisions about the research areas they want to support and the nations they should look for collaborations. It is, therefore, useful to look at the spatio-temporal trends of medical research prioritization to gain insight into popular and neglected areas of research as well as the allocation of prioritization of each nation. In this study, we develop a system that collects, classifies, and summarizes case report abstracts according to the location, time, and disease category of the report. The additional classifications allow us to visualize and monitor the trends in medical research prioritization by location, time, and disease category.
Collapse
|
3
|
Otsuka N, Kawanishi Y, Doi F, Takeda T, Okumura K, Yamauchi T, Yada S, Wakamiya S, Aramaki E, Makinodan M. Diagnosing psychiatric disorders from history of present illness using a large-scale linguistic model. Psychiatry Clin Neurosci 2023; 77:597-604. [PMID: 37526294 DOI: 10.1111/pcn.13580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/22/2023] [Accepted: 07/27/2023] [Indexed: 08/02/2023]
Abstract
AIM Recent advances in natural language processing models are expected to provide diagnostic assistance in psychiatry from the history of present illness (HPI). However, existing studies have been limited, with the target diseases including only major diseases, small sample sizes, or no comparison with diagnoses made by psychiatrists to ensure accuracy. Therefore, we formulated an accurate diagnostic model that covers all psychiatric disorders. METHODS HPIs and diagnoses were extracted from discharge summaries of 2,642 cases at the Nara Medical University Hospital, Japan, from 21 May 2007, to 31 May 31 2021. The diagnoses were classified into 11 classes according to the code from ICD-10 Chapter V. Using UTH-BERT pre-trained on the electronic medical records of the University of Tokyo Hospital, Japan, we predicted the main diagnoses at discharge based on HPIs and compared the concordance rate with the results of psychiatrists. The psychiatrists were divided into two groups: semi-Designated with 3-4 years of experience and Residents with only 2 months of experience. RESULTS The model's match rate was 74.3%, compared to 71.5% for the semi-Designated psychiatrists and 69.4% for the Residents. If the cases were limited to those correctly answered by the semi-Designated group, the model and the Residents performed at 84.9% and 83.3%, respectively. CONCLUSION We demonstrated that the model matched the diagnosis predicted from the HPI with a high probability to the principal diagnosis at discharge. Hence, the model can provide diagnostic suggestions in actual clinical practice.
Collapse
Affiliation(s)
- Norio Otsuka
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | - Yuu Kawanishi
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | - Fumimaro Doi
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | - Tsutomu Takeda
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | - Kazuki Okumura
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | | | - Shuntaro Yada
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Manabu Makinodan
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| |
Collapse
|
4
|
Azuaje G, Liew K, Buening R, She WJ, Siriaraya P, Wakamiya S, Aramaki E. Exploring the use of AI text-to-image generation to downregulate negative emotions in an expressive writing application. R Soc Open Sci 2023; 10:220238. [PMID: 36636309 PMCID: PMC9810434 DOI: 10.1098/rsos.220238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Conventional writing therapies are versatile, accessible and easy to facilitate online, but often require participants to self-disclose traumatic experiences. To make expressive writing therapies safer for online, unsupervised environments, we explored the use of text-to-image generation as a means to downregulate negative emotions during a fictional writing exercise. We developed a writing tool, StoryWriter, that uses Generative Adversarial Network models to generate artwork from users' narratives in real time. These images were intended to positively distract users from their negative emotions throughout the writing task. In this paper, we report the outcomes of two user studies: Study 1 (N = 388), which experimentally examined the efficacy of this application via negative versus neutral emotion induction and image generation versus no image generation control groups; and Study 2 (N = 54), which qualitatively examined open-ended feedback. Our results are heterogeneous: both studies suggested that StoryWriter somewhat contributed to improved emotion outcomes for participants with pre-existing negative emotions, but users' open-ended responses indicated that these outcomes may be adversely modulated by the generated images, which could undermine the therapeutic benefits of the writing task itself.
Collapse
Affiliation(s)
- Gamar Azuaje
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
| | - Kongmeng Liew
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
| | - Rebecca Buening
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
| | - Wan Jou She
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
- Center for Research on End of Life Care, Weill Cornell Medicine, 1300 York Avenue, New York, NY 10065, USA
| | - Panote Siriaraya
- Faculty of Information and Human Sciences, Kyoto Institute of Technology, Matsugasaki Hashikamicho, Sakyo Ward, Kyoto, Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Japan
| |
Collapse
|
5
|
Yao LF, Ferawati K, Liew K, Wakamiya S, Aramaki E. The Disruption of the Cystic Fibrosis Community’s Experiences and Concerns during the COVID-19 Pandemic: Topic Modeling and Time Series Analysis of Reddit Comments (Preprint). J Med Internet Res 2022; 25:e45249. [PMID: 37079359 PMCID: PMC10160941 DOI: 10.2196/45249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic disrupted the needs and concerns of the cystic fibrosis community. Patients with cystic fibrosis were particularly vulnerable during the pandemic due to overlapping symptoms in addition to the challenges patients with rare diseases face, such as the need for constant medical aid and limited information regarding their disease or treatments. Even before the pandemic, patients vocalized these concerns on social media platforms like Reddit and formed communities and networks to share insight and information. This data can be used as a quick and efficient source of information about the experiences and concerns of patients with cystic fibrosis in contrast to traditional survey- or clinical-based methods. OBJECTIVE This study applies topic modeling and time series analysis to identify the disruption caused by the COVID-19 pandemic and its impact on the cystic fibrosis community's experiences and concerns. This study illustrates the utility of social media data in gaining insight into the experiences and concerns of patients with rare diseases. METHODS We collected comments from the subreddit r/CysticFibrosis to represent the experiences and concerns of the cystic fibrosis community. The comments were preprocessed before being used to train the BERTopic model to assign each comment to a topic. The number of comments and active users for each data set was aggregated monthly per topic and then fitted with an autoregressive integrated moving average (ARIMA) model to study the trends in activity. To verify the disruption in trends during the COVID-19 pandemic, we assigned a dummy variable in the model where a value of "1" was assigned to months in 2020 and "0" otherwise and tested for its statistical significance. RESULTS A total of 120,738 comments from 5827 users were collected from March 24, 2011, until August 31, 2022. We found 22 topics representing the cystic fibrosis community's experiences and concerns. Our time series analysis showed that for 9 topics, the COVID-19 pandemic was a statistically significant event that disrupted the trends in user activity. Of the 9 topics, only 1 showed significantly increased activity during this period, while the other 8 showed decreased activity. This mixture of increased and decreased activity for these topics indicates a shift in attention or focus on discussion topics during this period. CONCLUSIONS There was a disruption in the experiences and concerns the cystic fibrosis community faced during the COVID-19 pandemic. By studying social media data, we were able to quickly and efficiently study the impact on the lived experiences and daily struggles of patients with cystic fibrosis. This study shows how social media data can be used as an alternative source of information to gain insight into the needs of patients with rare diseases and how external factors disrupt them.
Collapse
Affiliation(s)
- Lean Franzl Yao
- Social Computing Laboratory, Nara Institute of Science and Technology, Ikoma, Japan
| | - Kiki Ferawati
- Social Computing Laboratory, Nara Institute of Science and Technology, Ikoma, Japan
| | - Kongmeng Liew
- Social Computing Laboratory, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoko Wakamiya
- Social Computing Laboratory, Nara Institute of Science and Technology, Ikoma, Japan
| | - Eiji Aramaki
- Social Computing Laboratory, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
6
|
Nishiyama T, Yada S, Wakamiya S, Hori S, Aramaki E. Transferability Based on Drug Structure Similarity in Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach (Preprint). J Med Internet Res 2022; 25:e44870. [PMID: 37133915 DOI: 10.2196/44870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. OBJECTIVE This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. METHODS This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). RESULTS The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. CONCLUSIONS The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.
Collapse
Affiliation(s)
- Tomohiro Nishiyama
- Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shuntaro Yada
- Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoko Wakamiya
- Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Eiji Aramaki
- Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
7
|
Uehara M, Fujita S, Shimizu N, Liew K, Wakamiya S, Aramaki E. Measuring concerns about the COVID-19 vaccine among Japanese internet users through search queries. Sci Rep 2022; 12:15037. [PMID: 36057657 PMCID: PMC9440921 DOI: 10.1038/s41598-022-18307-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 08/09/2022] [Indexed: 11/09/2022] Open
Abstract
With the increasing availability of the COVID-19 vaccines, vaccination has been rapidly promoted globally as a countermeasure against the spread of COVID-19. In Japan, vaccination was first introduced in February 2021. However, the amount of concern towards vaccination differs between individuals, and topics of concern include adverse reactions and side effects. This study investigated attitudes toward vaccines or vaccination during the COVID-19 pandemic across different Japanese prefectures, using Yahoo! JAPAN search queries. We first defined a vaccine concern index (VCI) by aggregating the search counts of vaccine-related queries from Yahoo! JAPAN users before examining VCI across all Japanese prefectures, accounting for gender and age. Our results demonstrated that VCI tended to be lower in more populated areas, and VCI was higher in their 20s to 40s than older people, especially in female users. Furthermore, there was a significant positive correlation (Spearman's Rank correlation coefficient [Formula: see text] = 0.60, [Formula: see text]) between VCI and prefectural vaccination rate, suggesting that web searching of adverse vaccine reactions may precede actual vaccination. This could reflect the information-seeking behavior of individuals who are accepting of vaccinations.
Collapse
Affiliation(s)
- Makoto Uehara
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | | | | | - Kongmeng Liew
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan.
| |
Collapse
|
8
|
Mutinda FW, Liew K, Yada S, Wakamiya S, Aramaki E. Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer. BMC Med Inform Decis Mak 2022; 22:158. [PMID: 35717167 PMCID: PMC9206132 DOI: 10.1186/s12911-022-01897-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Background Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. Materials and methods Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. Results The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. Conclusion We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.
Collapse
Affiliation(s)
- Faith Wavinya Mutinda
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Kongmeng Liew
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shuntaro Yada
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan.
| |
Collapse
|
9
|
Nakamura Y, Hanaoka S, Nomura Y, Hayashi N, Abe O, Yada S, Wakamiya S, Aramaki E. Clinical Comparable Corpus Describing the Same Subjects with Different Expressions. Stud Health Technol Inform 2022; 290:253-257. [PMID: 35673012 DOI: 10.3233/shti220073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Medical artificial intelligence (AI) systems need to learn to recognize synonyms or paraphrases describing the same anatomy, disease, treatment, etc. to better understand real-world clinical documents. Existing linguistic resources focus on variants at the word or sentence level. To handle linguistic variations on a broader scale, we proposed the Medical Text Radiology Report section Japanese version (MedTxt-RR-JA), the first clinical comparable corpus. MedTxt-RR-JA was built by recruiting nine radiologists to diagnose the same 15 lung cancer cases in Radiopaedia, an open-access radiological repository. The 135 radiology reports in MedTxt-RR-JA were shown to contain word-, sentence- and document-level variations maintaining similarity of contents. MedTxt-RR-JA is also the first publicly available Japanese radiology report corpus that would help to overcome poor data availability for Japanese medical AI systems. Moreover, our methodology can be applied widely to building clinical corpora without privacy concerns.
Collapse
Affiliation(s)
- Yuta Nakamura
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Shouhei Hanaoka
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan
- The Department of Radiology, The University of Tokyo Hospital, Bunkyo, Tokyo, Japan
| | - Yukihiro Nomura
- The Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo, Tokyo, Japan
| | - Naoto Hayashi
- The Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo, Tokyo, Japan
| | - Osamu Abe
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan
- The Department of Radiology, The University of Tokyo Hospital, Bunkyo, Tokyo, Japan
| | - Shunrato Yada
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| |
Collapse
|
10
|
Mutinda FW, Yada S, Wakamiya S, Aramaki E. AUTOMETA: Automatic Meta-Analysis System Employing Natural Language Processing. Stud Health Technol Inform 2022; 290:612-616. [PMID: 35673089 DOI: 10.3233/shti220150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Meta-analyses examine the results of different clinical studies to determine whether a treatment is effective or not. Meta-analyses provide the gold standard for medical evidence. Despite their importance, meta-analyses are time-consuming and this poses a challenge where timeliness is important. Research articles are also increasing rapidly and most meta-analyses become outdated after publication since they have not incorporated new evidence. Therefore, there is increasing interest to automate meta-analysis so as to speed up the process and allow for automatic update when new results are available. In this preliminary study we present AUTOMETA, our proposed system for automating meta-analysis which employs existing natural language processing methods for identifying Participants, Intervention, Control, and Outcome (PICO) elements. We show that our system can perform advanced meta-analyses by parsing numeric outcomes to identify the number of patients having certain outcomes. We also present a new dataset which improves previous datasets by incorporating additional tags to identify detailed information.
Collapse
Affiliation(s)
- Faith W Mutinda
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Ikoma, Nara, Japan
| |
Collapse
|
11
|
Abstract
OBJECTIVES Owing to the rapid progress of natural language processing (NLP), the role of NLP in the medical field has radically gained considerable attention from both NLP and medical informatics. Although numerous medical NLP papers are published annually, there is still a gap between basic NLP research and practical product development. This gap raises questions, such as what has medical NLP achieved in each medical field, and what is the burden for the practical use of NLP? This paper aims to clarify the above questions. METHODS We explore the literature on potential NLP products/services applied to various medical/clinical/healthcare areas. RESULTS This paper introduces clinical applications (bedside applications), in which we introduce the use of NLP for each clinical department, internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology. Also, we clarify technical problems to be addressed for encouraging bedside applications based on NLP. CONCLUSIONS These results contribute to discussions regarding potentially feasible NLP applications and highlight research gaps for future studies.
Collapse
Affiliation(s)
- Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Yuta Nakamura
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
12
|
Ferawati K, Liew K, Aramaki E, Wakamiya S. Monitoring Mentions of COVID-19 Vaccine Side Effects from Japanese and Indonesian Twitter: Infodemiological Study (Preprint). JMIR Infodemiology 2022; 2:e39504. [PMID: 36277140 PMCID: PMC9578292 DOI: 10.2196/39504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/29/2022] [Accepted: 09/19/2022] [Indexed: 11/13/2022]
Abstract
Background The year 2021 was marked by vaccinations against COVID-19, which spurred wider discussion among the general population, with some in favor and some against vaccination. Twitter, a popular social media platform, was instrumental in providing information about the COVID-19 vaccine and has been effective in observing public reactions. We focused on tweets from Japan and Indonesia, 2 countries with a large Twitter-using population, where concerns about side effects were consistently stated as a strong reason for vaccine hesitancy. Objective This study aimed to investigate how Twitter was used to report vaccine-related side effects and to compare the mentions of these side effects from 2 messenger RNA (mRNA) vaccine types developed by Pfizer and Moderna, in Japan and Indonesia. Methods We obtained tweet data from Twitter using Japanese and Indonesian keywords related to COVID-19 vaccines and their side effects from January 1, 2021, to December 31, 2021. We then removed users with a high frequency of tweets and merged the tweets from multiple users as a single sentence to focus on user-level analysis, resulting in a total of 214,165 users (Japan) and 12,289 users (Indonesia). Then, we filtered the data to select tweets mentioning Pfizer or Moderna only and removed tweets mentioning both. We compared the side effect counts to the public reports released by Pfizer and Moderna. Afterward, logistic regression models were used to compare the side effects for the Pfizer and Moderna vaccines for each country. Results We observed some differences in the ratio of side effects between the public reports and tweets. Specifically, fever was mentioned much more frequently in tweets than would be expected based on the public reports. We also observed differences in side effects reported between Pfizer and Moderna vaccines from Japan and Indonesia, with more side effects reported for the Pfizer vaccine in Japanese tweets and more side effects with the Moderna vaccine reported in Indonesian tweets. Conclusions We note the possible consequences of vaccine side effect surveillance on Twitter and information dissemination, in that fever appears to be over-represented. This could be due to fever possibly having a higher severity or measurability, and further implications are discussed.
Collapse
Affiliation(s)
- Kiki Ferawati
- Graduate School of Science and Technology Nara Institute of Science and Technology Ikoma Japan
| | - Kongmeng Liew
- Graduate School of Science and Technology Nara Institute of Science and Technology Ikoma Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology Nara Institute of Science and Technology Ikoma Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology Nara Institute of Science and Technology Ikoma Japan
| |
Collapse
|
13
|
Wakamiya S, Morimoto O, Omichi K, Hara H, Kawase I, Koshiba R, Aramaki E. Exploring Relationships Between Tweet Numbers and Over-the-counter Drug Sales for Allergic Rhinitis: Retrospective Analysis. JMIR Form Res 2022; 6:e33941. [PMID: 35107434 PMCID: PMC8851323 DOI: 10.2196/33941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/19/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Health-related social media data are increasingly being used in disease surveillance studies. In particular, surveillance of infectious diseases such as influenza has demonstrated high correlations between the number of social media posts mentioning the disease and the number of patients who went to the hospital and were diagnosed with the disease. However, the prevalence of some diseases, such as allergic rhinitis, cannot be estimated based on the number of patients alone. Specifically, individuals with allergic rhinitis typically self-medicate by taking over-the-counter (OTC) medications without going to the hospital. Although allergic rhinitis is not a life-threatening disease, it represents a major social problem because it reduces people's quality of life, making it essential to understand its prevalence and people's motives for self-medication behavior. OBJECTIVE This study aims to explore the relationship between the number of social media posts mentioning the main symptoms of allergic rhinitis and the sales volume of OTC rhinitis medications in Japan. METHODS We collected tweets over 4 years (from 2017 to 2020) that included keywords corresponding to the main nasal symptoms of allergic rhinitis: "sneezing," "runny nose," and "stuffy nose." We also obtained the sales volume of OTC drugs, including oral medications and nasal sprays, for the same period. We then calculated the Pearson correlation coefficient between time series data on the number of tweets per week and time series data on the sales volume of OTC drugs per week. RESULTS The results showed a much higher correlation (r=0.8432) between the time series data on the number of tweets mentioning "stuffy nose" and the time series data on the sales volume of nasal sprays than for the other two symptoms. There was also a high correlation (r=0.9317) between the seasonal components of these time series data. CONCLUSIONS We investigated the relationships between social media data and behavioral patterns, such as OTC drug sales volume. Exploring these relationships can help us understand the prevalence of allergic rhinitis and the motives for self-care treatment using social media data, which would be useful as a marketing indicator to reduce the number of out-of-stocks in stores, provide (sell) rhinitis medicines to consumers in a stable manner, and reduce the loss of sales opportunities. In the future, in-depth investigations are required to estimate sales volume using social media data, and future research could investigate other diseases and countries.
Collapse
Affiliation(s)
- Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | | | | | | | | | | | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| |
Collapse
|
14
|
Kamba M, Manabe M, Wakamiya S, Yada S, Aramaki E, Odani S, Miyashiro I. Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach. JMIR Cancer 2021; 7:e32005. [PMID: 34709187 PMCID: PMC8587180 DOI: 10.2196/32005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/25/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND A large number of patient narratives are available on various web services. As for web question and answer services, patient questions often relate to medical needs, and we expect these questions to provide clues for a better understanding of patients' medical needs. OBJECTIVE This study aimed to extract patients' needs and classify them into thematic categories. Clarifying patient needs is the first step in solving social issues that patients with cancer encounter. METHODS For this study, we used patient question texts containing the key phrase "breast cancer," available at the Yahoo! Japan question and answer service, Yahoo! Chiebukuro, which contains over 60,000 questions on cancer. First, we converted the question text into a vector representation. Next, the relevance between patient needs and existing cancer needs categories was calculated based on cosine similarity. RESULTS The proportion of correct classifications in our proposed method was approximately 70%. Considering the results of classifying questions, we found the variation and the number of needs. CONCLUSIONS We created 3 corpora to classify the problems of patients with cancer. The proposed method was able to classify the problems considering the question text. Moreover, as an application example, the question text that included the side effect signaling of drugs and the unmet needs of cancer patients could be extracted. Revealing these needs is important to fulfill the medical needs of patients with cancer.
Collapse
Affiliation(s)
- Masaru Kamba
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Masae Manabe
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shoko Wakamiya
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shuntaro Yada
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Satomi Odani
- Cancer Control Center, Osaka International Cancer Institute, Osaka, Japan
| | - Isao Miyashiro
- Cancer Control Center, Osaka International Cancer Institute, Osaka, Japan
| |
Collapse
|
15
|
Manabe M, Liew K, Yada S, Wakamiya S, Aramaki E. Estimation of Psychological Distress in Japanese Youth Through Narrative Writing: Text-Based Stylometric and Sentiment Analyses. JMIR Form Res 2021; 5:e29500. [PMID: 34387556 PMCID: PMC8391726 DOI: 10.2196/29500] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 06/29/2021] [Accepted: 07/06/2021] [Indexed: 11/13/2022] Open
Abstract
Background Internalizing mental illnesses associated with psychological distress are often underdetected. Text-based detection using natural language processing (NLP) methods is increasingly being used to complement conventional detection efforts. However, these approaches often rely on self-disclosure through autobiographical narratives that may not always be possible, especially in the context of the collectivistic Japanese culture. Objective We propose the use of narrative writing as an alternative resource for mental illness detection in youth. Accordingly, in this study, we investigated the textual characteristics of narratives written by youth with psychological distress; our research focuses on the detection of psychopathological tendencies in written imaginative narratives. Methods Using NLP tools such as stylometric measures and lexicon-based sentiment analysis, we examined short narratives from 52 Japanese youth (mean age 19.8 years, SD 3.1) obtained through crowdsourcing. Participants wrote a short narrative introduction to an imagined story before completing a questionnaire to quantify their tendencies toward psychological distress. Based on this score, participants were categorized into higher distress and lower distress groups. The written narratives were then analyzed using NLP tools and examined for between-group differences. Although outside the scope of this study, we also carried out a supplementary analysis of narratives written by adults using the same procedure. Results Youth demonstrating higher tendencies toward psychological distress used significantly more positive (happiness-related) words, revealing differences in valence of the narrative content. No other significant differences were observed between the high and low distress groups. Conclusions Youth with tendencies toward mental illness were found to write more positive stories that contained more happiness-related terms. These results may potentially have widespread implications on psychological distress screening on online platforms, particularly in cultures such as Japan that are not accustomed to self-disclosure. Although the mechanisms that we propose in explaining our results are speculative, we believe that this interpretation paves the way for future research in online surveillance and detection efforts.
Collapse
Affiliation(s)
- Masae Manabe
- Nara Institute of Science and Technology, Nara, Japan
| | - Kongmeng Liew
- Nara Institute of Science and Technology, Nara, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
| | | | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| |
Collapse
|
16
|
Gao Z, Fujita S, Shimizu N, Liew K, Murayama T, Yada S, Wakamiya S, Aramaki E. Measuring Public Concern About COVID-19 in Japanese Internet Users Through Search Queries: Infodemiological Study. JMIR Public Health Surveill 2021; 7:e29865. [PMID: 34174781 PMCID: PMC8294121 DOI: 10.2196/29865] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/01/2021] [Accepted: 06/13/2021] [Indexed: 01/19/2023] Open
Abstract
Background COVID-19 has disrupted lives and livelihoods and caused widespread panic worldwide. Emerging reports suggest that people living in rural areas in some countries are more susceptible to COVID-19. However, there is a lack of quantitative evidence that can shed light on whether residents of rural areas are more concerned about COVID-19 than residents of urban areas. Objective This infodemiology study investigated attitudes toward COVID-19 in different Japanese prefectures by aggregating and analyzing Yahoo! JAPAN search queries. Methods We measured COVID-19 concerns in each Japanese prefecture by aggregating search counts of COVID-19–related queries of Yahoo! JAPAN users and data related to COVID-19 cases. We then defined two indices—the localized concern index (LCI) and localized concern index by patient percentage (LCIPP)—to quantitatively represent the degree of concern. To investigate the impact of emergency declarations on people's concerns, we divided our study period into three phases according to the timing of the state of emergency in Japan: before, during, and after. In addition, we evaluated the relationship between the LCI and LCIPP in different prefectures by correlating them with prefecture-level indicators of urbanization. Results Our results demonstrated that the concerns about COVID-19 in the prefectures changed in accordance with the declaration of the state of emergency. The correlation analyses also indicated that the differentiated types of public concern measured by the LCI and LCIPP reflect the prefectures’ level of urbanization to a certain extent (ie, the LCI appears to be more suitable for quantifying COVID-19 concern in urban areas, while the LCIPP seems to be more appropriate for rural areas). Conclusions We quantitatively defined Japanese Yahoo users’ concerns about COVID-19 by using the search counts of COVID-19–related search queries. Our results also showed that the LCI and LCIPP have external validity.
Collapse
Affiliation(s)
- Zhiwei Gao
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | | | | | - Kongmeng Liew
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Taichi Murayama
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shuntaro Yada
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoko Wakamiya
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Eiji Aramaki
- Social Computing Laboratory, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
17
|
Mutinda FW, Yada S, Wakamiya S, Aramaki E. Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT. Methods Inf Med 2021; 60:e56-e64. [PMID: 34237783 PMCID: PMC8294940 DOI: 10.1055/s-0041-1731390] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022]
Abstract
BACKGROUND Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. OBJECTIVE The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. MATERIALS We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. METHODS We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. RESULTS The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.
Collapse
Affiliation(s)
- Faith Wavinya Mutinda
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Shuntaro Yada
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| |
Collapse
|
18
|
Abstract
Fake news can have a significant negative impact on society because of the growing use of mobile devices and the worldwide increase in Internet access. It is therefore essential to develop a simple mathematical model to understand the online dissemination of fake news. In this study, we propose a point process model of the spread of fake news on Twitter. The proposed model describes the spread of a fake news item as a two-stage process: initially, fake news spreads as a piece of ordinary news; then, when most users start recognizing the falsity of the news item, that itself spreads as another news story. We validate this model using two datasets of fake news items spread on Twitter. We show that the proposed model is superior to the current state-of-the-art methods in accurately predicting the evolution of the spread of a fake news item. Moreover, a text analysis suggests that our model appropriately infers the correction time, i.e., the moment when Twitter users start realizing the falsity of the news item. The proposed model contributes to understanding the dynamics of the spread of fake news on social media. Its ability to extract a compact representation of the spreading pattern could be useful in the detection and mitigation of fake news.
Collapse
Affiliation(s)
- Taichi Murayama
- Nara Institute of Science and Technology (NAIST), Ikoma, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Ikoma, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Ikoma, Japan
| | - Ryota Kobayashi
- The University of Tokyo, Tokyo, Japan
- JST PRESTO, Kawaguchi, Japan
- * E-mail:
| |
Collapse
|
19
|
Ujiie S, Yada S, Wakamiya S, Aramaki E. Identification of Adverse Drug Event-Related Japanese Articles: Natural Language Processing Analysis. JMIR Med Inform 2020; 8:e22661. [PMID: 33245290 PMCID: PMC7732716 DOI: 10.2196/22661] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/05/2020] [Accepted: 10/28/2020] [Indexed: 12/23/2022] Open
Abstract
Background Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. Objective Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. Methods Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. Results Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. Conclusions A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.
Collapse
Affiliation(s)
- Shogo Ujiie
- Nara Institute of Science and Technology, Nara, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
| | | | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| |
Collapse
|
20
|
Hisada S, Murayama T, Tsubouchi K, Fujita S, Yada S, Wakamiya S, Aramaki E. Surveillance of early stage COVID-19 clusters using search query logs and mobile device-based location information. Sci Rep 2020; 10:18680. [PMID: 33122686 PMCID: PMC7596075 DOI: 10.1038/s41598-020-75771-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/01/2020] [Indexed: 12/18/2022] Open
Abstract
Two clusters of the coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan, in February 2020. To identify these clusters, this study employed web search query logs of multiple devices and user location information from location-aware mobile devices. We anonymously identified users who used a web search engine (i.e., Yahoo! JAPAN) to search for COVID-19 or its symptoms. We regarded them as web searchers who were suspicious of their own COVID-19 infection (WSSCI). We extracted the location of WSSCI via a mobile operating system application and compared the spatio-temporal distribution of WSSCI with the actual location of the two known clusters. In the early stage of cluster development, we confirmed several WSSCI. Our approach was accurate in this stage and became biased after a public announcement of the cluster development. When other cluster-related resources, such as detailed population statistics, are not available, the proposed metric can capture hints of emerging clusters.
Collapse
Affiliation(s)
- Shohei Hisada
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Taichi Murayama
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | | | | | - Shuntaro Yada
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan.
| |
Collapse
|
21
|
Murayama T, Shimizu N, Fujita S, Wakamiya S, Aramaki E. Robust two-stage influenza prediction model considering regular and irregular trends. PLoS One 2020; 15:e0233126. [PMID: 32437380 PMCID: PMC7241782 DOI: 10.1371/journal.pone.0233126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/28/2020] [Indexed: 11/18/2022] Open
Abstract
Influenza causes numerous deaths worldwide every year. Predicting the number of influenza patients is an important task for medical institutions. Two types of data regarding influenza-like illnesses (ILIs) are often used for flu prediction: (1) historical data and (2) user generated content (UGC) data on the web such as search queries and tweets. Historical data have an advantage against the normal state but show disadvantages against irregular phenomena. In contrast, UGC data are advantageous for irregular phenomena. So far, no effective model providing the benefits of both types of data has been devised. This study proposes a novel model, designated the two-stage model, which combines both historical and UGC data. The basic idea is, first, basic regular trends are estimated using the historical data-based model, and then, irregular trends are predicted by the UGC data-based model. Our approach is practically useful because we can train models separately. Thus, if a UGC provider changes the service, our model could produce better performance because the first part of the model is still stable. Experiments on the US and Japan datasets demonstrated the basic feasibility of the proposed approach. In the dropout (pseudo-noise) test that assumes a UGC service would change, the proposed method also showed robustness against outliers. The proposed model is suitable for prediction of seasonal flu.
Collapse
Affiliation(s)
- Taichi Murayama
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| | | | | | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| |
Collapse
|
22
|
Aramaki E, Honda C, Wakamiya S, Sato A, Myashiro I. Quick Cognitive Impairment Test for Cancer Patients Using Emotional Stroop Effect. Stud Health Technol Inform 2019; 264:1629-1630. [PMID: 31438264 DOI: 10.3233/shti190568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent studies have attributed impaired cognitive function in cancer patients, or Cancer Related Cognitive Impairment (CRCI), to various causes. CRCI screening is vital for guiding important decisions about treatment options. This study investigates the emotional Stroop-test-based CRCI screening, examining response time when naming the colors of negative emotional words. Cancer patients (n=17) participated in two tests: (1) the Stroop task; (2) State-Trait Anxiety Inventory. Results suggest that Stroop-based CRCI screening is feasible.
Collapse
Affiliation(s)
- Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Chihiro Honda
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Akira Sato
- Osaka International Cancer Institute, Osaka, Japan
| | | |
Collapse
|
23
|
Aramaki E, Miyabe M, Honda C, Isozaki S, Wakamiya S, Sato A, Miyashiro I. KOTOBAKARI Study: Using Natural Language Processing of Patient Short Narratives to Detect Cancer Related Cognitive Impairment. Stud Health Technol Inform 2019; 264:1111-1115. [PMID: 31438097 DOI: 10.3233/shti190398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
BACKGROUND Recent reports of some studies have described that the cognitive function of cancer patients often declines by a phenomenon designated as cancer related cognitive impairment (CRCI). For patients' decision-making, detecting CRCI is important. To do so, this study uses language-based CRCI screening to examine participants' language ability. OBJECTIVE This study was conducted to ascertain whether a Natural Language Processing (NLP) based system can detect CRCI, or not. MATERIALS AND METHODS We obtained materials of two types from cancer patients (n = 116): (1) speech samples on three topics, and (2) cognitive function level test scores from Hasegawa's Dementia Scale - Revised (HDS-R), a test used in Japan for dementia patients. The test is similar to the Mini-Mental State Examination. RESULTS AND DISCUSSION Cancer patients with lower HDS-R scores showed a significantly lower Type Token Ratio (TTR). CONCLUSION This result demonstrates the feasibility of the proposed speech-language-based CRCI screening method.
Collapse
Affiliation(s)
- Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Mai Miyabe
- Suwa University of Science, Nagano, Japan
| | - Chihiro Honda
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Seiko Isozaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Akira Sato
- Osaka International Cancer Institute, Osaka, Japan
| | | |
Collapse
|
24
|
Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E. Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations. J Med Internet Res 2019; 21:e12783. [PMID: 30785407 PMCID: PMC6401666 DOI: 10.2196/12783] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/12/2018] [Accepted: 12/13/2018] [Indexed: 11/13/2022] Open
Abstract
Background The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. Objective This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. Methods In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. Results The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. Conclusions This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
Collapse
Affiliation(s)
- Shoko Wakamiya
- Institute for Research Initiatives, Nara Institute of Science and Technology, Ikoma, Japan.,Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| | | | | | | | - Eiji Aramaki
- Institute for Research Initiatives, Nara Institute of Science and Technology, Ikoma, Japan.,Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
25
|
Wakamiya S, Matsune S, Okubo K, Aramaki E. Causal Relationships Among Pollen Counts, Tweet Numbers, and Patient Numbers for Seasonal Allergic Rhinitis Surveillance: Retrospective Analysis. J Med Internet Res 2019; 21:e10450. [PMID: 30785411 PMCID: PMC6401667 DOI: 10.2196/10450] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 11/08/2018] [Accepted: 12/10/2018] [Indexed: 12/29/2022] Open
Abstract
Background Health-related social media data are increasingly used in disease-surveillance studies, which have demonstrated moderately high correlations between the number of social media posts and the number of patients. However, there is a need to understand the causal relationship between the behavior of social media users and the actual number of patients in order to increase the credibility of disease surveillance based on social media data. Objective This study aimed to clarify the causal relationships among pollen count, the posting behavior of social media users, and the number of patients with seasonal allergic rhinitis in the real world. Methods This analysis was conducted using datasets of pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis from Kanagawa Prefecture, Japan. We examined daily pollen counts for Japanese cedar (the major cause of seasonal allergic rhinitis in Japan) and hinoki cypress (which commonly complicates seasonal allergic rhinitis) from February 1 to May 31, 2017. The daily numbers of tweets that included the keyword “kafunshō” (or seasonal allergic rhinitis) were calculated between January 1 and May 31, 2017. Daily numbers of patients with seasonal allergic rhinitis from January 1 to May 31, 2017, were obtained from three healthcare institutes that participated in the study. The Granger causality test was used to examine the causal relationships among pollen count, tweet numbers, and the number of patients with seasonal allergic rhinitis from February to May 2017. To determine if time-variant factors affect these causal relationships, we analyzed the main seasonal allergic rhinitis phase (February to April) when Japanese cedar trees actively produce and release pollen. Results Increases in pollen count were found to increase the number of tweets during the overall study period (P=.04), but not the main seasonal allergic rhinitis phase (P=.05). In contrast, increases in pollen count were found to increase patient numbers in both the study period (P=.04) and the main seasonal allergic rhinitis phase (P=.01). Increases in the number of tweets increased the patient numbers during the main seasonal allergic rhinitis phase (P=.02), but not the overall study period (P=.89). Patient numbers did not affect the number of tweets in both the overall study period (P=.24) and the main seasonal allergic rhinitis phase (P=.47). Conclusions Understanding the causal relationships among pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis is an important step to increasing the credibility of surveillance systems that use social media data. Further in-depth studies are needed to identify the determinants of social media posts described in this exploratory analysis.
Collapse
Affiliation(s)
- Shoko Wakamiya
- Institute for Research Initiatives, Nara Institute of Science and Technology, Ikoma, Japan.,Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shoji Matsune
- Musashikosugi Hospital, Nippon Medical School, Kawasaki, Japan
| | - Kimihiro Okubo
- Nippon Medical School Hospital, Nippon Medical School, Bunkyo, Japan
| | - Eiji Aramaki
- Institute for Research Initiatives, Nara Institute of Science and Technology, Ikoma, Japan.,Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
26
|
Usui M, Aramaki E, Iwao T, Wakamiya S, Sakamoto T, Mochizuki M. Extraction and Standardization of Patient Complaints from Electronic Medication Histories for Pharmacovigilance: Natural Language Processing Analysis in Japanese. JMIR Med Inform 2018; 6:e11021. [PMID: 30262450 PMCID: PMC6231790 DOI: 10.2196/11021] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 08/07/2018] [Accepted: 08/25/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Despite the growing number of studies using natural language processing for pharmacovigilance, there are few reports on manipulating free text patient information in Japanese. OBJECTIVE This study aimed to establish a method of extracting and standardizing patient complaints from electronic medication histories accumulated in a Japanese community pharmacy for the detection of possible adverse drug event (ADE) signals. METHODS Subjective information included in electronic medication history data provided by a Japanese pharmacy operating in Hiroshima, Japan from September 1, 2015 to August 31, 2016, was used as patients' complaints. We formulated search rules based on morphological analysis and daily (nonmedical) speech and developed a system that automatically executes the search rules and annotates free text data with International Classification of Diseases, Tenth Revision (ICD-10) codes. The performance of the system was evaluated through comparisons with data manually annotated by health care workers for a data set of 5000 complaints. RESULTS Of 5000 complaints, the system annotated 2236 complaints with ICD-10 codes, whereas health care workers annotated 2348 statements. There was a match in the annotation of 1480 complaints between the system and manual work. System performance was .66 regarding precision, .63 in recall, and .65 for the F-measure. CONCLUSIONS Our results suggest that the system may be helpful in extracting and standardizing patients' speech related to symptoms from massive amounts of free text data, replacing manual work. After improving the extraction accuracy, we expect to utilize this system to detect signals of possible ADEs from patients' complaints in the future.
Collapse
Affiliation(s)
- Misa Usui
- Division of Hospital Pharmacy Science, Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| | - Eiji Aramaki
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Tomohide Iwao
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Shoko Wakamiya
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | | | - Mayumi Mochizuki
- Division of Hospital Pharmacy Science, Faculty of Pharmacy, Keio University, Tokyo, Japan.,Department of Pharmacy, Keio University Hospital, Tokyo, Japan
| |
Collapse
|
27
|
Wakamiya S, Kawai Y, Aramaki E. Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Health Surveill 2018; 4:e65. [PMID: 30274968 PMCID: PMC6231889 DOI: 10.2196/publichealth.8627] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 02/24/2018] [Accepted: 07/18/2018] [Indexed: 11/13/2022] Open
Abstract
Background The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share the following logic: they incorporate SNS users as social sensors. These social sensor–based approaches also share a common problem: SNS-based surveillance are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. Objective This study proposes a novel approach to estimate the trend of patient numbers using indirect information covering both urban areas and rural areas within the posts. Methods We presented a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning 3 years (7 million influenza-related tweets in Japanese) was used to evaluate the model. Both direct information and indirect information that mention other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not used directly and were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. Results The estimation performance of our approach was evaluated using the correlation coefficient between the number of influenza cases as the gold standard values and the estimated values by the proposed models. The results revealed that the baseline model (BASELINE+NLP) shows .36 and that the proposed model (TRAP+NLP) improved the accuracy (.70, +.34 points). Conclusions The proposed approach by which the indirect information inhibits direct information exhibited improved estimation performance not only in rural cities but also in urban cities, which demonstrated the effectiveness of the proposed method consisting of a TRAP model and natural language processing (NLP) classification.
Collapse
Affiliation(s)
| | - Yukiko Kawai
- Kyoto Sangyo University, Kyoto, Japan.,Osaka University, Osaka, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
28
|
Aramaki E, Yano K, Wakamiya S. MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts. Stud Health Technol Inform 2017; 245:285-288. [PMID: 29295100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Because of recent replacement of physical documents with electronic medical records (EMR), the importance of information processing in the medical field has increased. In light of this trend, we have been developing MedEx/J, which retrieves important Japanese language information from medical reports. MedEx/J executes two tasks simultaneously: (1) term extraction, and (2) positive and negative event classification. We designate this approach as a one-scan approach, providing simplicity of systems and reasonable accuracy. MedEx/J performance on the two tasks is described herein: (1) term extraction (F<inf>β = 1</inf> = 0.87) and (2) positive-negative classification (F<inf>β = 1</inf> = 0.63). This paper also presents discussion and explains remaining issues in the medical natural language processing field.
Collapse
Affiliation(s)
- Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Japan
| | - Ken Yano
- Nara Institute of Science and Technology (NAIST), Japan
| | | |
Collapse
|
29
|
Iso H, Wakamiya S, Aramaki E. Conditional Density Estimation of Tweet Location: A Feature-Dependent Approach. Stud Health Technol Inform 2017; 245:408-411. [PMID: 29295126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Twitter-based public health surveillance systems have achieved many successes. Underlying this success, much useful information has been associated with tweets such as temporal and spatial information. For fine-grained investigation of disease propagation, this information is attributed a more important role. Unlike temporal information that is always available, spatial information is less available because of privacy concerns. To extend the availability of spatial information, many geographic identification systems have been developed. However, almost no origin of the user location can be identified, even if a human reads the tweet contents. This study estimates the geographic origin of tweets with reliability using a density estimation approach. Our method reveals how the model interprets the origin of user location according to the spread of estimated density.
Collapse
Affiliation(s)
- Hayate Iso
- Nara Institute of Science and Technology (NAIST), Japan
| | | | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Japan
| |
Collapse
|
30
|
Wakamiya S, Lee R, Sumiya K. Crowd-Powered TV Viewing Rates: Measuring Relevancy between Tweets and TV Programs. Database Systems for Adanced Applications 2011. [DOI: 10.1007/978-3-642-20244-5_37] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
31
|
Abstract
An experimental study was conducted to determine whether external disturbance oscillations, such as those that could be created by hand held tools, alter the dynamic response characteristics of the human arm-muscle system. A special arm-test frame was used to induce external sinusoidal torque oscillations of various amplitudes and frequencies, while the reaction force and angular displacement were monitored. Two different output variable frequency responses were determined using input/output cross-spectrum analysis. The angular displacement of the test frame and a component of hand reaction force were the output variables used, while the test frame torque was the input. Test results from one subject are presented in this paper. Changes in the magnitude and phase angle of the frequency responses were observed for different frequencies of the disturbance torque. These changes indicate that the stability margin and response amplitude of the human arm-muscle system do change as a function of the frequency and amplitude of external disturbance oscillations. This suggests that at certain operating frequencies hand held tools can induce large reaction amplitudes or even loss of control.
Collapse
|