1
|
Cam H, Cam AV, Demirel U, Ahmed S. Sentiment analysis of financial Twitter posts on Twitter with the machine learning classifiers. Heliyon 2024; 10:e23784. [PMID: 38205287 PMCID: PMC10776998 DOI: 10.1016/j.heliyon.2023.e23784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/05/2023] [Accepted: 12/13/2023] [Indexed: 01/12/2024] Open
Abstract
This paper presents a sentiment analysis combining the lexicon-based and machine learning (ML)-based approaches in Turkish to investigate the public mood for the prediction of stock market behavior in BIST30, Borsa Istanbul. Our main motivation behind this study is to apply sentiment analysis to financial-related tweets in Turkish. We import 17189 tweets posted as "#Borsaistanbul, #Bist, #Bist30, #Bist100″ on Twitter between November 7, 2022, and November 15, 2022, via a MAXQDA 2020, a qualitative data analysis program. For the lexicon-based side, we use a multilingual sentiment offered by the Orange program to label the polarities of the 17189 samples as positive, negative, and neutral labels. Neutral labels are discarded for the machine learning experiments. For the machine learning side, we select 9076 data as positive and negative to implement the classification problem with six different supervised machine learning classifiers conducted in Python 3.6 with the sklearn library. In experiments, 80 % of the selected data is used for the training phase and the rest is used for the testing and validation phase. Results of the experiments show that the Support Vector Machine and Multilayer Perceptron classifier perform better than other classifiers with 0.89 and 0.88 accuracy and AUC values of 0.8729 and 0.8647 respectively. Other classifiers obtain approximately a 78,5 % accuracy rate. It is possible to increase sentiment analysis accuracy with parameter optimization on a larger, cleaner, and more balanced dataset by changing the pre-processing steps. This work can be expanded in the future to develop better sentiment analysis using deep learning approaches.
Collapse
Affiliation(s)
- Handan Cam
- Department of Management Information Systems, Faculty of Economic and Administrative Science, Gumushane University, 29000, Gumushane, Turkey
| | - Alper Veli Cam
- Department of Health Care Management, Faculty of Health Sciences, Gumushane University, 29000, Gumushane, Turkey
| | - Ugur Demirel
- Irfan Can Kose Vocational School, Gumushane University, 29000, Gumushane, Turkey
| | - Sana Ahmed
- Henley Business School, University of Reading, Reading, RG6 6AH, UK
| |
Collapse
|
2
|
Darmawan G, Handoko B, Faidah DY, Islamiaty D. Improving the Forecasting Accuracy Based on the Lunar Calendar in Modeling Rainfall Levels Using the Bi-LSTM Method through the Grid Search Approach. ScientificWorldJournal 2023; 2023:1863346. [PMID: 38189057 PMCID: PMC10771920 DOI: 10.1155/2023/1863346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/26/2023] [Accepted: 12/07/2023] [Indexed: 01/09/2024] Open
Abstract
Rainfall is one of the climatic factors that influence various human activities and affect decision making in daily life activities. High intensity of rainfall can turn into a threat and cause serious problems such as causing various natural disasters. Therefore, it is essential to conduct rainfall forecasting to anticipate and enable preventive actions and can be used as a decision consideration in increasing the productivity and mobility of human activities. The aim of this study is to compare rainfall accuracy between the Gregorian and the lunar calendars using the bidirectional long short-term memory (Bi-LSTM) machine learning model through the grid search approach. This method was used because it can capture patterns arising from the simultaneous effects of two asynchronous calendars, Gregorian and lunar, which were used in this study by finding the right parameters. Monthly rainfall data from Bogor City, Indonesia, were used from the period of 2001 to 2022. The results show that the MAPE of the lunar calendar is relatively smaller at 14.82% which indicates the better forecasting ability than the Gregorian calendar which is 35.12%.
Collapse
Affiliation(s)
- Gumgum Darmawan
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl.Bandung-Sumedang Km 21 Jatinangor, Sumedang 45363, Indonesia
| | - Budhi Handoko
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl.Bandung-Sumedang Km 21 Jatinangor, Sumedang 45363, Indonesia
| | - Defi Yusti Faidah
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl.Bandung-Sumedang Km 21 Jatinangor, Sumedang 45363, Indonesia
| | - Dian Islamiaty
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl.Bandung-Sumedang Km 21 Jatinangor, Sumedang 45363, Indonesia
| |
Collapse
|
3
|
Shin H, Yuniar CT, Oh S, Purja S, Park S, Lee H, Kim E. The Adverse Effects and Nonmedical Use of Methylphenidate Before and After the Outbreak of COVID-19: Machine Learning Analysis. J Med Internet Res 2023; 25:e45146. [PMID: 37585250 PMCID: PMC10468706 DOI: 10.2196/45146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 05/09/2023] [Accepted: 06/28/2023] [Indexed: 08/17/2023] Open
Abstract
BACKGROUND Methylphenidate is an effective first-line treatment for attention-deficit/hyperactivity disorder (ADHD). However, many adverse effects of methylphenidate have been recorded from randomized clinical trials and patient-reported outcomes, but it is difficult to determine abuse from them. In the context of COVID-19, it is important to determine how drug use evaluation, as well as misuse of drugs, have been affected by the pandemic. As people share their reasons for using medication, patient sentiments, and the effects of medicine on social networking services (SNSs), the application of machine learning and SNS data can be a method to overcome the limitations. Proper machine learning models could be evaluated to validate the effects of the COVID-19 pandemic on drug use. OBJECTIVE To analyze the effect of the COVID-19 pandemic on the use of methylphenidate, this study analyzed the adverse effects and nonmedical use of methylphenidate and evaluated the change in frequency of nonmedical use based on SNS data before and after the outbreak of COVID-19. Moreover, the performance of 4 machine learning models for classifying methylphenidate use based on SNS data was compared. METHODS In this cross-sectional study, SNS data on methylphenidate from Twitter, Facebook, and Instagram from January 2019 to December 2020 were collected. The frequency of adverse effects, nonmedical use, and drug use before and after the COVID-19 pandemic were compared and analyzed. Interrupted time series analysis about the frequency and trends of nonmedical use of methylphenidate was conducted for 24 months from January 2019 to December 2020. Using the labeled training data set and features, the following 4 machine learning models were built using the data, and their performance was evaluated using F-1 scores: naïve Bayes classifier, random forest, support vector machine, and long short-term memory. RESULTS This study collected 146,352 data points and detected that 4.3% (6340/146,352) were firsthand experience data. Psychiatric problems (521/1683, 31%) had the highest frequency among the adverse effects. The highest frequency of nonmedical use was for studies or work (741/2016, 36.8%). While the frequency of nonmedical use before and after the outbreak of COVID-19 has been similar (odds ratio [OR] 1.02 95% CI 0.91-1.15), its trend has changed significantly due to the pandemic (95% CI 2.36-22.20). Among the machine learning models, RF had the highest performance of 0.75. CONCLUSIONS The trend of nonmedical use of methylphenidate has changed significantly due to the COVID-19 pandemic. Among the machine learning models using SNS data to analyze the adverse effects and nonmedical use of methylphenidate, the random forest model had the highest performance.
Collapse
Affiliation(s)
- Hocheol Shin
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - Cindra Tri Yuniar
- Department of Pharmacology and Clinical Pharmacy, School of Pharmacy, Institut Teknologi Bandung, Bandung, Indonesia
| | - SuA Oh
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - Sujata Purja
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - Sera Park
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - Haeun Lee
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - Eunyoung Kim
- Evidence-Based Clinical Research Laboratory, Department of Health Science and Clinical Pharmacy, Chung-Ang University, Seoul, Republic of Korea
- Regulatory Science Pharmacy, College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| |
Collapse
|
4
|
Qorib M, Oladunni T, Denis M, Ososanya E, Cotae P. COVID-19 Vaccine Hesitancy: A Global Public Health and Risk Modelling Framework Using an Environmental Deep Neural Network, Sentiment Classification with Text Mining and Emotional Reactions from COVID-19 Vaccination Tweets. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:ijerph20105803. [PMID: 37239532 DOI: 10.3390/ijerph20105803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 05/28/2023]
Abstract
Popular social media platforms, such as Twitter, have become an excellent source of information with their swift information dissemination. Individuals with different backgrounds convey their opinions through social media platforms. Consequently, these platforms have become a profound instrument for collecting enormous datasets. We believe that compiling, organizing, exploring, and analyzing data from social media platforms, such as Twitter, can offer various perspectives to public health organizations and decision makers in identifying factors that contribute to vaccine hesitancy. In this study, public tweets were downloaded daily from Tweeter using the Tweeter API. Before performing computation, the tweets were preprocessed and labeled. Vocabulary normalization was based on stemming and lemmatization. The NRCLexicon technique was deployed to convert the tweets into ten classes: positive sentiment, negative sentiment, and eight basic emotions (joy, trust, fear, surprise, anticipation, anger, disgust, and sadness). t-test was used to check the statistical significance of the relationships among the basic emotions. Our analysis shows that the p-values of joy-sadness, trust-disgust, fear-anger, surprise-anticipation, and negative-positive relations are close to zero. Finally, neural network architectures, including 1DCNN, LSTM, Multiple-Layer Perceptron, and BERT, were trained and tested in a COVID-19 multi-classification of sentiments and emotions (positive, negative, joy, sadness, trust, disgust, fear, anger, surprise, and anticipation). Our experiment attained an accuracy of 88.6% for 1DCNN at 1744 s, 89.93% accuracy for LSTM at 27,597 s, while MLP achieved an accuracy of 84.78% at 203 s. The study results show that the BERT model performed the best, with an accuracy of 96.71% at 8429 s.
Collapse
Affiliation(s)
- Miftahul Qorib
- Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC 20008, USA
- Department of Mathematics and Statistics, University of the District of Columbia, Washington, DC 20008, USA
| | - Timothy Oladunni
- Department of Computer Science, Morgan State University, Baltimore, MD 21251, USA
| | - Max Denis
- Department of Mechanical and Biomedical Engineering, University of the District of Columbia, Washington, DC 20008, USA
| | - Esther Ososanya
- Department of Electrical and Computer Engineering, University of the District of Columbia, Washington, DC 20008, USA
| | - Paul Cotae
- Department of Electrical and Computer Engineering, University of the District of Columbia, Washington, DC 20008, USA
| |
Collapse
|
5
|
Umair A, Masciari E, Ullah MH. Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches. THE JOURNAL OF SUPERCOMPUTING 2023; 79:1-31. [PMID: 37359330 PMCID: PMC10164419 DOI: 10.1007/s11227-023-05319-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/17/2023] [Indexed: 06/28/2023]
Abstract
Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731-5780, 2022. 10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.
Collapse
Affiliation(s)
- Areeba Umair
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Via Claudio 21, 80125 Naples, Campania Italy
| | - Elio Masciari
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Via Claudio 21, 80125 Naples, Campania Italy
| | - Muhammad Habib Ullah
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Via Claudio 21, 80125 Naples, Campania Italy
| |
Collapse
|
6
|
Maglietta R, Saccotelli L, Fanizza C, Telesca V, Dimauro G, Causio S, Lecci R, Federico I, Coppini G, Cipriano G, Carlucci R. Environmental variables and machine learning models to predict cetacean abundance in the Central-eastern Mediterranean Sea. Sci Rep 2023; 13:2600. [PMID: 36788321 PMCID: PMC9929343 DOI: 10.1038/s41598-023-29681-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Although the Mediterranean Sea is a crucial hotspot in marine biodiversity, it has been threatened by numerous anthropogenic pressures. As flagship species, Cetaceans are exposed to those anthropogenic impacts and global changes. Assessing their conservation status becomes strategic to set effective management plans. The aim of this paper is to understand the habitat requirements of cetaceans, exploiting the advantages of a machine-learning framework. To this end, 28 physical and biogeochemical variables were identified as environmental predictors related to the abundance of three odontocete species in the Northern Ionian Sea (Central-eastern Mediterranean Sea). In fact, habitat models were built using sighting data collected for striped dolphins Stenella coeruleoalba, common bottlenose dolphins Tursiops truncatus, and Risso's dolphins Grampus griseus between July 2009 and October 2021. Random Forest was a suitable machine learning algorithm for the cetacean abundance estimation. Nitrate, phytoplankton carbon biomass, temperature, and salinity were the most common influential predictors, followed by latitude, 3D-chlorophyll and density. The habitat models proposed here were validated using sighting data acquired during 2022 in the study area, confirming the good performance of the strategy. This study provides valuable information to support management decisions and conservation measures in the EU marine spatial planning context.
Collapse
Affiliation(s)
- Rosalia Maglietta
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, National Research Council, via Amendola 122/D-I, 70126, Bari, Italy.
| | - Leonardo Saccotelli
- Ocean Predictions and Applications Division, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Lecce, Italy
| | - Carmelo Fanizza
- Jonian Dolphin Conservation, viale Virgilio 102, 74121, Taranto, Italy
| | - Vito Telesca
- School of Engineering, University of Basilicata, viale Ateneo Lucano 10, 85100, Potenza, Italy
| | - Giovanni Dimauro
- Department of Computer Science, University of Bari, via Orabona 4, 70125, Bari, Italy
| | - Salvatore Causio
- Ocean Predictions and Applications Division, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Lecce, Italy
| | - Rita Lecci
- Ocean Predictions and Applications Division, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Lecce, Italy
| | - Ivan Federico
- Ocean Predictions and Applications Division, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Lecce, Italy
| | - Giovanni Coppini
- Ocean Predictions and Applications Division, Centro Euro-Mediterraneo sui Cambiamenti Climatici, Lecce, Italy
| | - Giulia Cipriano
- Department of Biology, University of Bari, via Orabona 4, 70125, Bari, Italy
| | - Roberto Carlucci
- Department of Biology, University of Bari, via Orabona 4, 70125, Bari, Italy
| |
Collapse
|
7
|
Ding C, Zhang Y, Ding T. A systematic hybrid machine learning approach for stress prediction. PeerJ Comput Sci 2023; 9:e1154. [PMID: 37346555 PMCID: PMC10280269 DOI: 10.7717/peerj-cs.1154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/21/2022] [Indexed: 06/23/2023]
Abstract
Stress is becoming an increasingly prevalent health issue, seriously affecting people and putting their health and lives at risk. Frustration, nervousness, and anxiety are the symptoms of stress and these symptoms are becoming common (40%) in younger people. It creates a negative impact on human lives and damages the performance of each individual. Early prediction of stress and the level of stress can help to reduce its impact and different serious health issues related to this mental state. For this, automated systems are required so they can accurately predict stress levels. This study proposed an approach that can detect stress accurately and efficiently using machine learning techniques. We proposed a hybrid model (HB) which is a combination of gradient boosting machine (GBM) and random forest (RF). These models are combined using soft voting criteria in which each model's prediction probability will be used for the final prediction. The proposed model is significant with 100% accuracy in comparison with the state-of-the-art approaches. To show the significance of the proposed approach we have also done 10-fold cross-validation using the proposed model and the proposed HB model outperforms with 1.00 mean accuracy and +/-0.00 standard deviation. In the end, a statistical T-test we have done to show the significance of the proposed approach in comparison with other approaches.
Collapse
Affiliation(s)
- Cheng Ding
- Emory University, Atlanta, GA, United States
| | - Yuhao Zhang
- University of Nottingham, Nottingham, United Kingdom
| | - Ting Ding
- East China University of Technology, NAN Chang, China
| |
Collapse
|
8
|
Arbane M, Benlamri R, Brik Y, Alahmar AD. Social media-based COVID-19 sentiment classification model using Bi-LSTM. EXPERT SYSTEMS WITH APPLICATIONS 2023; 212:118710. [PMID: 36060151 PMCID: PMC9425711 DOI: 10.1016/j.eswa.2022.118710] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 06/26/2022] [Accepted: 08/25/2022] [Indexed: 06/15/2023]
Abstract
Internet public social media and forums provide a convenient channel for people concerned about public health issues, such as COVID-19, to share and discuss information/misinformation with each other. In this paper, we propose a natural language processing (NLP) method based on Bidirectional Long Short-Term Memory (Bi-LSTM) technique to perform sentiment classification and uncover various issues related to COVID-19 public opinions. Bi-LSTM is an improved version of conventional LSTMs for generating the output from both left and right contexts at each time step. We experimented with real datasets extracted from Twitter and Reddit social media platforms, and our experimental results showed improved metrics compared with the conventional LSTM model as well as recent studies available in the literature. The proposed model can be used by official institutions to mitigate the effects of negative messages and to understand peoples' concerns during the pandemic. Furthermore, our findings shed light on the importance of using NLP techniques to analyze public opinion and to combat the spreading of misinformation and to guide health decision-making.
Collapse
Affiliation(s)
- Mohamed Arbane
- LASS Laboratory, Mohamed Boudiaf University, M'sila, 28000, Algeria
| | - Rachid Benlamri
- University of Doha for Science and Technology, Doha, PO Box 24449, Qatar
| | - Youcef Brik
- LASS Laboratory, Mohamed Boudiaf University, M'sila, 28000, Algeria
| | - Ayman Diyab Alahmar
- Department of Software Engineering, Lakehead University, Thunder Bay, P7B 5E1, Ontario, Canada
| |
Collapse
|
9
|
Aslan S, Kızıloluk S, Sert E. TSA-CNN-AOA: Twitter sentiment analysis using CNN optimized via arithmetic optimization algorithm. Neural Comput Appl 2023; 35:10311-10328. [PMID: 36714074 PMCID: PMC9867606 DOI: 10.1007/s00521-023-08236-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 01/06/2023] [Indexed: 01/21/2023]
Abstract
COVID-19, a novel virus from the coronavirus family, broke out in Wuhan city of China and spread all over the world, killing more than 5.5 million people. The speed of spreading is still critical as an infectious disease, and it causes more and more deaths each passing day. COVID-19 pandemic has resulted in many different psychological effects on people's mental states, such as anxiety, fear, and similar complex feelings. Millions of people worldwide have shared their opinions on COVID-19 on several social media websites, particularly on Twitter. Therefore, it is likely to minimize the negative psychological impact of the disease on society by obtaining individuals' views on COVID-19 from social media platforms, making deductions from their statements, and identifying negative statements about the disease. In this respect, Twitter sentiment analysis (TSA), a recently popular research topic, is used to perform data analysis on social media platforms such as Twitter and reach certain conclusions. The present study, too, proposes TSA using convolutional neural network optimized via arithmetic optimization algorithm (TSA-CNN-AOA) approach. Firstly, using a designed API, 173,638 tweets about COVID-19 were extracted from Twitter between July 25, 2020, and August 30, 2020 to create a database. Later, significant information was extracted from this database using FastText Skip-gram. The proposed approach benefits from a designed convolutional neural network (CNN) model as a feature extractor. Thanks to arithmetic optimization algorithm (AOA), a feature selection process was also applied to the features obtained from CNN. Later, K-nearest neighbors (KNN), support vector machine, and decision tree were used to classify tweets as positive, negative, and neutral. In order to measure the TSA performance of the proposed method, it was compared with different approaches. The results demonstrated that TSA-CNN-AOA (KNN) achieved the highest tweet classification performance with an accuracy rate of 95.098. It is evident from the experimental studies that the proposed approach displayed a much higher TSA performance compared to other similar approaches in the existing literature.
Collapse
Affiliation(s)
- Serpil Aslan
- Department of Software Engineering, Faculty of Engineering and Natural Sciences, Malatya Turgut Ozal University, 44210 Malatya, Turkey
| | - Soner Kızıloluk
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Malatya Turgut Ozal University, 44210 Malatya, Turkey
| | - Eser Sert
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Malatya Turgut Ozal University, 44210 Malatya, Turkey
| |
Collapse
|
10
|
Siddiqui HUR, de Abajo BS, Díez IDLT, Rustam F, Raza A, Atta S, Ashraf I. Predicting bankruptcy of firms using earnings call data and transfer learning. PeerJ Comput Sci 2023; 9:e1134. [PMID: 37346732 PMCID: PMC10280182 DOI: 10.7717/peerj-cs.1134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/27/2022] [Indexed: 06/23/2023]
Abstract
Business collapse is a common event in economies, small and big alike. A firm's health is crucial to its stakeholders like creditors, investors, partners, etc. and prediction of the upcoming financial crisis is significantly important to devise appropriate strategies to avoid business collapses. Bankruptcy prediction has been regarded as a critical topic in the world of accounting and finance. Methodologies and strategies have been investigated in the research domain for predicting company bankruptcy more promptly and accurately. Conventionally, predicting the financial risk and bankruptcy has been solely achieved using the historic financial data. CEOs also communicate verbally via press releases and voice characteristics, such as emotion and tone may reflect a company's success, according to anecdotal evidence. Companies' publicly available earning calls data is one of the main sources of information to understand how businesses are doing and what are expectations for the next quarters. An earnings call is a conference call between the management of a company and the media. During the call, management offers an overview of recent performance and provides a guide for the next quarter's expectations. The earning calls summary provided by the management can extract CEO's emotions using sentiment analysis. This article investigates the prediction of firms' health in terms of bankruptcy and non-bankruptcy based on emotions extracted from earning calls and proposes a deep learning model in this regard. Features extracted from long short-term memory (LSTM) network are used to train machine learning models. Results show that the models provide results with a high score of 0.93, each for accuracy and F1 when trained on LSTM extracted feature from synthetic minority oversampling technique (SMOTE) balanced data. LSTM features provide better performance than traditional bag of words and TF-IDF features.
Collapse
Affiliation(s)
- Hafeez Ur Rehman Siddiqui
- Faculty of Computer Science and Information Technology, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Beatriz Sainz de Abajo
- Department of Signal Theory, Communications and Telematics Engineering, Unviersity of Valladolid, Spain
| | - Isabel de la Torre Díez
- Department of Signal Theory, Communications and Telematics Engineering, Unviersity of Valladolid, Spain
| | - Furqan Rustam
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Amjad Raza
- Faculty of Computer Science and Information Technology, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Sajjad Atta
- Faculty of Computer Science and Information Technology, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan si, Republic of Korea
| |
Collapse
|
11
|
Swapnarekha H, Nayak J, Behera HS, Dash PB, Pelusi D. An optimistic firefly algorithm-based deep learning approach for sentiment analysis of COVID-19 tweets. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2382-2407. [PMID: 36899539 DOI: 10.3934/mbe.2023112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The unprecedented rise in the number of COVID-19 cases has drawn global attention, as it has caused an adverse impact on the lives of people all over the world. As of December 31, 2021, more than 2, 86, 901, 222 people have been infected with COVID-19. The rise in the number of COVID-19 cases and deaths across the world has caused fear, anxiety and depression among individuals. Social media is the most dominant tool that disturbed human life during this pandemic. Among the social media platforms, Twitter is one of the most prominent and trusted social media platforms. To control and monitor the COVID-19 infection, it is necessary to analyze the sentiments of people expressed on their social media platforms. In this study, we proposed a deep learning approach known as a long short-term memory (LSTM) model for the analysis of tweets related to COVID-19 as positive or negative sentiments. In addition, the proposed approach makes use of the firefly algorithm to enhance the overall performance of the model. Further, the performance of the proposed model, along with other state-of-the-art ensemble and machine learning models, has been evaluated by using performance metrics such as accuracy, precision, recall, the AUC-ROC and the F1-score. The experimental results reveal that the proposed LSTM + Firefly approach obtained a better accuracy of 99.59% when compared with the other state-of-the-art models.
Collapse
Affiliation(s)
- H Swapnarekha
- Department of Information Technology, Aditya Institute of Technology and Management (AITAM), Tekkali, Andhra Pradesh 532201, India
- Department of Information Technology, Veer Surendra Sai University of Technology, Burla 768018, India
| | - Janmenjoy Nayak
- Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo University, Baripada, Odisha 757003, India
| | - H S Behera
- Department of Information Technology, Veer Surendra Sai University of Technology, Burla 768018, India
| | - Pandit Byomakesha Dash
- Department of Information Technology, Aditya Institute of Technology and Management (AITAM), Tekkali, Andhra Pradesh 532201, India
| | - Danilo Pelusi
- Communication Sciences, University of Teramo, Coste Sant'agostino Campus, Teramo 64100, Italy
| |
Collapse
|
12
|
Sentimental and spatial analysis of COVID-19 vaccines tweets. J Intell Inf Syst 2023; 60:1-21. [PMID: 35462784 PMCID: PMC9012072 DOI: 10.1007/s10844-022-00699-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/24/2022] [Accepted: 02/24/2022] [Indexed: 11/29/2022]
Abstract
The world has to face health concerns due to huge spread of COVID. For this reason, the development of vaccine is the need of hour. The higher vaccine distribution, the higher the immunity against coronavirus. Therefore, there is a need to analyse the people's sentiment for the vaccine campaign. Today, social media is the rich source of data where people share their opinions and experiences by their posts, comments or tweets. In this study, we have used the twitter data of vaccines of COVID and analysed them using methods of artificial intelligence and geo-spatial methods. We found the polarity of the tweets using the TextBlob() function and categorized them. Then, we designed the word clouds and classified the sentiments using the BERT model. We then performed the geo-coding and visualized the feature points over the world map. We found the correlation between the feature points geographically and then applied hotspot analysis and kernel density estimation to highlight the regions of positive, negative or neutral sentiments. We used precision, recall and F score to evaluate our model and compare our results with the state-of-the-art methods. The results showed that our model achieved 55% & 54% precision, 69% & 85% recall and 58% & 64% F score for positive class and negative class respectively. Thus, these sentimental and spatial analysis helps in world-wide pandemics by identify the people's attitudes towards the vaccines.
Collapse
|
13
|
Wadhwani GK, Varshney PK, Gupta A, Kumar S. Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia-Ukraine War. SN COMPUTER SCIENCE 2023; 4:346. [PMID: 37125219 PMCID: PMC10120493 DOI: 10.1007/s42979-023-01790-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 03/13/2023] [Indexed: 05/02/2023]
Abstract
The Russia-Ukrainian War refers to the ongoing hostilities between Russia and Ukraine. It was first focused on whether Crimea and the Donbass were formally recognised as being a part of Ukraine when Russia started it in February 2014. The conflict dramatically grew when Russia began its incursion of Ukraine on February 24, 2022, following a military build-up on the Russian-Ukrainian border that started in late 2021. Examining public perceptions of the crisis between Russia and Ukraine is the goal of this piece. These days, social media has taken on a significant role in communication, and as a result, opinions can be found on platforms like Facebook, Twitter, and Instagram. The study makes use of his 11,250 tweets about the war between Russia and Ukraine from his Twitter account. Techniques, including image processing, object identification, and natural language processing, have shown application, power, and potential for machine learning. The same applies to text analytics. For text analysis, sentiment analysis, and entity annotation, machine learning techniques are frequently employed. According to the applicability and efficacy of the machine learning model, natural language processing toolkit in python is utilised in to examine the textual polarity and subjectivity score of tweets. Moreover, because machine learning models have a high degree of classification accuracy, they have been widely utilised to categorise emotions. We have developed and test models using three feature extraction techniques: TF-IDF (term frequency-inverse document frequency), BoW (bag of words), and N-gram. Each model was assessed using a number of important performance indicators, including accuracy, precision, recall, and F1 score. Results show that the extra trees classifier (ETC) model achieves a highest accuracy of 0.84 in combination with the Bow property which is a measure to evaluate the efficacy of a machine learning algorithm. Logistic regression (LR), decision tree (DT), support vector machine (SVM), XGB, Gaussian naive base (GNB), ADA, and K-nearest neighbours (KNN) comparison have also been made.
Collapse
Affiliation(s)
| | | | - Anjali Gupta
- Department of Computer Science, IITM, GGSIPU, New Delhi, India
| | - Shrawan Kumar
- Department of Computer Science and Engineering, Shoolini University, Solan, Himachal Pradesh India
| |
Collapse
|
14
|
Ghosh A, Umer S, Khan MK, Rout RK, Dhara BC. Smart sentiment analysis system for pain detection using cutting edge techniques in a smart healthcare framework. CLUSTER COMPUTING 2023; 26:119-135. [PMID: 35125934 PMCID: PMC8799976 DOI: 10.1007/s10586-022-03552-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 01/12/2022] [Accepted: 01/13/2022] [Indexed: 05/04/2023]
Abstract
A sentiment analysis system has been proposed in this paper for pain detection using cutting edge techniques in a smart healthcare framework. This proposed system may be eligible for detecting pain sentiments by analyzing facial expressions on the human face. The implementation of the proposed system has been divided into four components. The first component is about detecting the face region from the input image using a tree-structured part model. Statistical and deep learning-based feature analysis has been performed in the second component to extract more valuable and distinctive patterns from the extracted facial region. In the third component, the prediction models based on statistical and deep feature analysis derive scores for the pain intensities (no-pain, low-pain, and high-pain) on the facial region. The scores due to the statistical and deep feature analysis are fused to enhance the performance of the proposed method in the fourth component. We have employed two benchmark facial pain expression databases during experimentation, such as UNBC-McMaster shoulder pain and 2D Face-set database with Pain-expression. The performance concerning these databases has been compared with some existing state-of-the-art methods. These comparisons show the superiority of the proposed system.
Collapse
Affiliation(s)
- Anay Ghosh
- Department of Computer Science & Engineering, University of Engineering & Management, Kolkata, 700156 India
| | - Saiyed Umer
- Department of Computer Science & Engineering, Aliah University, Kolkata, 700156 India
| | - Muhammad Khurram Khan
- Center of Excellence in Information Assurance, College of Computer and Information Sciences, King Saud University, Riyadh, 11451 Saudi Arabia
| | - Ranjeet Kumar Rout
- Department of Computer Science & Engineering, National Institute of Technology, Srinagar, 190006 India
| | - Bibhas Chandra Dhara
- Department of Information Technology, Jadavpur University, Kolkata, 700098 India
| |
Collapse
|
15
|
Agrawal S, Jain SK, Sharma S, Khatri A. COVID-19 Public Opinion: A Twitter Healthcare Data Processing Using Machine Learning Methodologies. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:432. [PMID: 36612755 PMCID: PMC9819913 DOI: 10.3390/ijerph20010432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/20/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
The COVID-19 pandemic has shattered the whole world, and due to this, millions of people have posted their sentiments toward the pandemic on different social media platforms. This resulted in a huge information flow on social media and attracted many research studies aimed at extracting useful information to understand the sentiments. This paper analyses data imported from the Twitter API for the healthcare sector, emphasizing sub-domains, such as vaccines, post-COVID-19 health issues and healthcare service providers. The main objective of this research is to analyze machine learning models for classifying the sentiments of people and analyzing the direction of polarity by considering the views of the majority of people. The inferences drawn from this analysis may be useful for concerned authorities as they work to make appropriate policy decisions and strategic decisions. Various machine learning models were developed to extract the actual emotions, and results show that the support vector machine model outperforms with an average accuracy of 82.67% compared with the logistic regression, random forest, multinomial naïve Bayes and long short-term memory models, which present 78%, 77%, 68.67% and 75% accuracy, respectively.
Collapse
Affiliation(s)
- Shweta Agrawal
- Institute of Advanced Computing, SAGE University, Indore 452010, India
| | - Sanjiv Kumar Jain
- Electrical Engineering Department, Medi-Caps University, Indore 453331, India
| | - Shruti Sharma
- Department of Computer Science and Engineering, Indore Institute of Science &Technology, Indore 453332, India
| | - Ajay Khatri
- Bellurbis Technologies Private Limited, Indore 452001, India
| |
Collapse
|
16
|
Kour H, Gupta MK. AI Assisted Attention Mechanism for Hybrid Neural Model to Assess Online Attitudes About COVID-19. Neural Process Lett 2022; 55:1-40. [PMID: 36575702 PMCID: PMC9780630 DOI: 10.1007/s11063-022-11112-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
COVID-19 is a novel virus that presents challenges due to a lack of consistent and in-depth research. The news of the COVID-19 spreads across the globe, resulting in a flood of posts on social media sites. Apart from health, social, and economic disturbances brought by the COVID-19 pandemic, another important consequence involves public mental health crises which is of greater concern. Data related to COVID-19 is a valuable asset for researchers in understanding people's feelings related to the pandemic. It is thus important to extract the early information evolving public sentiments on social platforms during the outbreak of COVID-19. The objective of this study is to look at people's perceptions of the COVID-19 pandemic who interact with each other and share tweets on the Twitter platform. COVIDSenti, a large-scale benchmark dataset comprising 90,000 COVID-19 tweets collected from February to March 2020, during the initial phases of the outbreak served as the foundation for our experiments. A pre-trained bidirectional encoder representations from transformers (BERT) model is fine-tuned and embeddings generated are combined with two long short-term memory networks to propose the residual encoder transformation network model. The proposed model is used for multiclass text classification on a large dataset labeled as positive, negative, and neutral. The experimental outcomes validate that: (1) the proposed model is the best performing model, with 98% accuracy and 96% F1-score; (2) It also outperforms conventional machine learning algorithms and different variants of BERT, and (3) the approach achieves better results as compared to state-of-the-art on different benchmark datasets.
Collapse
Affiliation(s)
- Harnain Kour
- Department of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, India
| | - Manoj K. Gupta
- Department of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, India
| |
Collapse
|
17
|
Aslam N, Xia K, Rustam F, Lee E, Ashraf I. Self voting classification model for online meeting app review sentiment analysis and topic modeling. PeerJ Comput Sci 2022; 8:e1141. [PMID: 37346305 PMCID: PMC10280218 DOI: 10.7717/peerj-cs.1141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 10/10/2022] [Indexed: 06/23/2023]
Abstract
Online meeting applications (apps) have emerged as a potential solution for conferencing, education and meetings, etc. during the COVID-19 outbreak and are used by private companies and governments alike. A large number of such apps compete with each other by providing a different set of functions towards users' satisfaction. These apps take users' feedback in the form of opinions and reviews which are later used to improve the quality of services. Sentiment analysis serves as the key function to obtain and analyze users' sentiments from the posted feedback indicating the importance of efficient and accurate sentiment analysis. This study proposes the novel idea of self voting classification (SVC) where multiple variants of the same model are trained using different feature extraction approaches and the final prediction is based on the ensemble of these variants. For experiments, the data collected from the Google Play store for online meeting apps were used. Primarily, the focus of this study is to use a support vector machine (SVM) with the proposed SVC approach using both soft voting (SV) and hard voting (HV) criteria, however, decision tree, logistic regression, and k nearest neighbor have also been investigated for performance appraisal. Three variants of models are trained on a bag of words, term frequency-inverse document frequency, and hashing features to make the ensemble. Experimental results indicate that the proposed SVC approach can elevate the performance of traditional machine learning models substantially. The SVM obtains 1.00 and 0.98 accuracy scores, using HV and SV criteria, respectively when used with the proposed SVC approach. Topic-wise sentiment analysis using the latent Dirichlet allocation technique is performed as well for topic modeling.
Collapse
Affiliation(s)
- Naila Aslam
- School of Electronics and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Kewen Xia
- School of Electronics and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Furqan Rustam
- School of Computer Science University College Dublin, Dublin, Ireland
| | - Ernesto Lee
- College of Engineering and Technology Miami Dade College, Miami, FL, USA
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Republic of Korea
| |
Collapse
|
18
|
Umer M, Sadiq S, karamti H, Abdulmajid Eshmawi A, Nappi M, Usman Sana M, Ashraf I. ETCNN: Extra Tree and Convolutional Neural Network-based Ensemble Model for COVID-19 Tweets Sentiment Classification. Pattern Recognit Lett 2022; 164:224-231. [PMID: 36407854 PMCID: PMC9664766 DOI: 10.1016/j.patrec.2022.11.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 10/09/2022] [Accepted: 11/11/2022] [Indexed: 11/17/2022]
Abstract
Pandemics influence people negatively and people experience fear and disappointment. With the global outspread of COVID-19, the sentiments of the general public are substantially influenced, and analyzing their sentiments could help to devise corresponding policies to alleviate negative sentiments. Often the data collected from social media platforms is unstructured leading to low classification accuracy. This study brings forward an ensemble model where the benefits of handcrafted features and automatic feature extraction are combined by machine learning and deep learning models. Unstructured data is obtained, preprocessed, and annotated using TextBlob and VADER before training machine learning models. Similarly, the efficiency of Word2Vec, TF, and TF-IDF features is also analyzed. Results reveal the better performance of the extra tree classifier when trained with TF-IDF features from TextBlob annotated data. Overall, machine learning models perform better with TF-IDF and TextBlob. The proposed model obtains superior performance using both annotation techniques with 0.97 and 0.95 scores of accuracy using TextBlob and VADER respectively with Word2Vec features. Results reveal that use of machine learning and deep learning models together with a voting criterion tends to yield better results than other machine learning models. Analysis of sentiments indicates that predominantly people possess negative sentiments regarding COVID-19.
Collapse
Affiliation(s)
- Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan
| | - Saima Sadiq
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Hanen karamti
- Department of computer sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia
| | | | - Michele Nappi
- Department of Computer Science, University of Salerno, Fisciano, Italy,Corresponding author
| | - Muhammad Usman Sana
- College of Computer Science Technology, Xian University of Science and Technology, Xian, Shaanxi 710054, China
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea,Corresponding author
| |
Collapse
|
19
|
Studying topic engagement and synergy among candidates for 2020 US Elections. SOCIAL NETWORK ANALYSIS AND MINING 2022; 12:136. [PMID: 36118938 PMCID: PMC9464427 DOI: 10.1007/s13278-022-00959-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/13/2022] [Accepted: 08/21/2022] [Indexed: 10/31/2022]
Abstract
This article provides a comprehensive summary of how candidates running in the 2020 US Presidential Elections used Twitter to communicate with the public. More specifically, it aims to uncover elements linked to public engagement and internal cooperation (in terms of content and stance similarity among the candidates from the same political front, and with respect to the official Twitter accounts of their political parties). Our main subjects are the Presidential and Vice-Presidential candidates who contested for the 2020 US Elections from the two major political fronts—Republicans and Democrats. Their tweets were evaluated for social reach, content similarity and stance similarity on 22 topics. According to the findings, Joe Biden had the highest engagement and impact (user impact: 177.08k, normalized to 0.99), followed by Donald Trump (user impact: 164.19k, normalized to 0.92). The Democrats depicted a clearer understanding of their audience, portraying an essential link between public participation, internal cooperation and the electoral campaign. The results also demonstrate that specific topics (like US Elections, and Inauguration Ceremony) were more engaging than others (Trump Healthcare Plan, and The Supreme Court Appointments). This study adds to the existing work on using social media platforms for electoral campaigns and can be effectively utilized by contesting candidates.
Collapse
|
20
|
Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLoS One 2022; 17:e0276525. [DOI: 10.1371/journal.pone.0276525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/08/2022] [Indexed: 11/11/2022] Open
Abstract
Maternal health is an important aspect of women’s health during pregnancy, childbirth, and the postpartum period. Specifically, during pregnancy, different health factors like age, blood disorders, heart rate, etc. can lead to pregnancy complications. Detecting such health factors can alleviate the risk of pregnancy-related complications. This study aims to develop an artificial neural network-based system for predicting maternal health risks using health data records. A novel deep neural network architecture, DT-BiLTCN is proposed that uses decision trees, a bidirectional long short-term memory network, and a temporal convolutional network. Experiments involve using a dataset of 1218 samples collected from maternal health care, hospitals, and community clinics using the IoT-based risk monitoring system. Class imbalance is resolved using the synthetic minority oversampling technique. DT-BiLTCN provides a feature set to obtain high accuracy results which in this case are provided by the support vector machine with a 98% accuracy. Maternal health exploratory data analysis reveals that the health conditions which are the strongest indications of health risk during pregnancy are diastolic and systolic blood pressure, heart rate, and age of pregnant women. Using the proposed model, timely prediction of health risks associated with pregnant women can be made thus mitigating the risk of health complications which helps to save lives.
Collapse
|
21
|
Semantic Sentiment Classification for COVID-19 Tweets Using Universal Sentence Encoder. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:6354543. [PMID: 36248924 PMCID: PMC9556213 DOI: 10.1155/2022/6354543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/30/2022] [Accepted: 09/23/2022] [Indexed: 11/17/2022]
Abstract
The spread of data on the web has increased in the last twenty years. One of the reasons is the appearance of social media. The data on social sites describe many real-life events in our daily lives. In the period of the COVID-19 pandemic, a lot of people and media organizations were writing and documenting their health status and the latest news about the coronavirus on social media. Using these tweets (sentiments) about the coronavirus and analyzing them in a computational model can help decision makers in measuring public opinion and yielding remarkable findings. In this research article, we introduce a deep learning sentiment analysis model based on Universal Sentence Encoder. The dataset used in this research was collected from Twitter, and it was classified as positive, neutral, and negative. The sentence embedding model determines the meaning of word sequences instead of individual words. The model divides the dataset into training and testing and depends on the sentence similarity in detecting sentiment class. The obtained accuracy results reached 78.062%, and this result outperforms many traditional ML classifiers based on TF-IDF applied on the same dataset and another model based on the CNN classifier.
Collapse
|
22
|
Rahman MM, Khan NI, Sarker IH, Ahmed M, Islam MN. Leveraging machine learning to analyze sentiment from COVID-19 tweets: A global perspective. ENGINEERING REPORTS : OPEN ACCESS 2022; 5:e12572. [PMID: 36247344 PMCID: PMC9538004 DOI: 10.1002/eng2.12572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 07/13/2022] [Accepted: 08/15/2022] [Indexed: 06/16/2023]
Abstract
Since the advent of the worldwide COVID-19 pandemic, analyzing public sentiment has become one of the major concerns for policy and decision-makers. While the priority is to curb the spread of the virus, mass population (user) sentiment analysis is equally important. Though sentiment analysis using different state-of-the-art technologies has been focused on during the COVID-19 pandemic, the reasons behind the variations in public sentiment are yet to be explored. Moreover, how user sentiment varies due to the COVID-19 pandemic from a cross-country perspective has been less focused on. Therefore, the objectives of this study are: to identify the most effective machine learning (ML) technique for classifying public sentiments, to analyze the variations of public sentiment across the globe, and to find the critical contributing factors to sentiment variations. To attain the objectives, 12,000 tweets, 3000 each from the USA, UK, and Bangladesh, were rigorously annotated by three independent reviewers. Based on the labeled tweets, four different boosting ML models, namely, CatBoost, gradient boost, AdaBoost, and XGBoost, are investigated. Next, the top performed ML model predicted sentiment of 300,000 data (100,000 from each country). The public perceptions have been analyzed based on the labeled data. As an outcome, the CatBoost model showed the highest (85.8%) F1-score, followed by gradient boost (84.3%), AdaBoost (78.9%), and XGBoost (83.1%). Second, it was revealed that during the time of the COVID-19 pandemic, the sentiments of the people of the three countries mainly were negative, followed by positive and neutral. Finally, this study identified a few critical concerns that impact primarily varying public sentiment around the globe: lockdown, quarantine, hospital, mask, vaccine, and the like.
Collapse
Affiliation(s)
- Md Mahbubar Rahman
- Department of Computer Science and EngineeringMilitary Institute of Science and Technology (MIST)DhakaBangladesh
| | - Nafiz Imtiaz Khan
- Department of Computer Science and EngineeringMilitary Institute of Science and Technology (MIST)DhakaBangladesh
| | - Iqbal H. Sarker
- Department of Computer Science and EngineeringChittagong University of Engineering and TechnologyChittagongBangladesh
| | - Mohiuddin Ahmed
- School of ScienceEdith Cowan UniversityJoondalupWestern AustraliaAustralia
| | - Muhammad Nazrul Islam
- Department of Computer Science and EngineeringMilitary Institute of Science and Technology (MIST)DhakaBangladesh
| |
Collapse
|
23
|
Ahmad A, Rustam F, Saad E, Siddique MA, Lee E, Mansilla AO, Díez IDLT, Ashraf I. Analyzing preventive precautions to limit spread of COVID-19. PLoS One 2022; 17:e0272350. [PMID: 36001556 PMCID: PMC9401132 DOI: 10.1371/journal.pone.0272350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 07/19/2022] [Indexed: 01/08/2023] Open
Abstract
With the global spread of COVID-19, the governments advised the public for adopting safety precautions to limit its spread. The virus spreads from people, contaminated places, and nozzle droplets that necessitate strict precautionary measures. Consequently, different safety precautions have been implemented to fight COVID-19 such as wearing a facemask, restriction of social gatherings, keeping 6 feet distance, etc. Despite the warnings, highlighted need for such measures, and the increasing severity of the pandemic situation, the expected number of people adopting these precautions is low. This study aims at assessing and understanding the public perception of COVID-19 safety precautions, especially the use of facemask. A unified framework of sentiment lexicon with the proposed ensemble EB-DT is devised to analyze sentiments regarding safety precautions. Extensive experiments are performed with a large dataset collected from Twitter. In addition, the factors leading to a negative perception of safety precautions are analyzed by performing topic analysis using the Latent Dirichlet allocation algorithm. The experimental results reveal that 12% of the tweets correspond to negative sentiments towards facemask precaution mainly by its discomfort. Analysis of change in peoples’ sentiment over time indicates a gradual increase in the positive sentiments regarding COVID-19 restrictions.
Collapse
Affiliation(s)
- Ayaz Ahmad
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Furqan Rustam
- Department of Software Engineering, School of Systems and Technology, University of Management and Technology Lahore, Lahore, Pakistan
| | - Eysha Saad
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Muhammad Abubakar Siddique
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Ernesto Lee
- Department of Computer Science, Broward College, Broward County, Florida, United States of America
| | - Arturo Ortega Mansilla
- European University of The Atlantic, Santander, Spain
- Iberoamerican International University, Campeche, Mexico
| | - Isabel de la Torre Díez
- Department of Signal Theory and Communications and Telematic Engineering, Unviersity of Valladolid, Valladolid, Spain
- * E-mail: (ITD); (IA)
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea
- * E-mail: (ITD); (IA)
| |
Collapse
|
24
|
Umair A, Masciari E. Human sentiments monitoring during COVID-19 using AI-based modeling. PROCEDIA COMPUTER SCIENCE 2022; 203:753-758. [PMID: 35974968 PMCID: PMC9374315 DOI: 10.1016/j.procs.2022.07.112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The whole world is facing health challenges due to wide spread of COVID-19 pandemic. To control the spread of COVID-19, the development of its vaccine is the need of hour. Considering the importance of the vaccines, many industries have put their efforts in vaccine development. The higher immunity against the COVID can be achieved by high intake of the vaccines. Therefore, it is important to analysis the people's behaviour and sentiments towards vaccines. Today is the era of social media, where people mostly share their emotions, experience, or opinions about any trending topic in the form of tweets, comments or posts. In this study, we have used the freely available COVID-19 vaccines dataset and analysed the people reactions on the vaccine campaign using artificial intelligence methods. We used TextBlob() function of python and found out the polarity of the tweets. We applied the BERT model and classify the tweets into negative and positive classes based on their polarity values. The classification results show that BERT has achieved maximum values of precision, recall and F score for both positive and negative sentiment classification.
Collapse
Affiliation(s)
- Areeba Umair
- Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Naples 80125, Italy
| | - Elio Masciari
- Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Naples 80125, Italy
- Institute for High Performance Computing and Networking (ICAR), National Research Council, Naples, Italy
| |
Collapse
|
25
|
Dangi D, Dixit DK, Bhagat A. Sentiment analysis of COVID-19 social media data through machine learning. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:42261-42283. [PMID: 35912062 PMCID: PMC9309239 DOI: 10.1007/s11042-022-13492-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 10/15/2021] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
Pandemics are a severe threat to lives in the universe and our universe encounters several pandemics till now. COVID-19 is one of them, which is a viral infectious disease that increased morbidity and mortality worldwide. This has a negative impact on countries' economies, as well as social and political concerns throughout the world. The growths of social media have witnessed much pandemic-related news and are shared by many groups of people. This social media news was also helpful to analyze the effects of the pandemic clearly. Twitter is one of the social media networks where people shared COVID-19 related news in a wider range. Meanwhile, several approaches have been proposed to analyze the COVID-19 related sentimental analysis. To enhance the accuracy of sentimental analysis, we have proposed a novel approach known as Sentimental Analysis of Twitter social media Data (SATD). Our proposed method is based on five different machine learning models such as Logistic Regression, Random Forest Classifier, Multinomial NB Classifier, Support Vector Machine, and Decision Tree Classifier. These five classifiers possess various advantages and hence the proposed approach effectively classifies the tweets from the Twint. Experimental analyses are made and these classifier models are used to calculate different values such as precision, recall, f1-score, and support. Moreover, the results are also represented as a confusion matrix, accuracy, precision, and receiver operating characteristic (ROC) graphs. From the experimental and discussion section, it is obtained that the accuracy of our proposed classifier model is high.
Collapse
Affiliation(s)
- Dharmendra Dangi
- Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India
| | - Dheeraj K. Dixit
- Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India
| | - Amit Bhagat
- Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India
| |
Collapse
|
26
|
Singhal A, Baxi MK, Mago V. Synergy between Public and Private Healthcare Organizations during COVID-19 on Twitter. JMIR Med Inform 2022; 10:e37829. [PMID: 35849795 PMCID: PMC9390834 DOI: 10.2196/37829] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/08/2022] [Accepted: 07/15/2022] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Social media platforms (SMPs) are frequently used by various pharmaceutical companies, public health agencies, and NGOs for communicating health concerns, new advancements, and potential outbreaks. While the benefits of using them as a tool have been extensively discussed, the online activity of various healthcare organizations on SMPs during COVID-19 in terms of engagement and sentiment forecasting has not been thoroughly investigated. OBJECTIVE The purpose of this research is to analyze the nature of information shared on Twitter, understand the public engagement generated on it, and forecast the sentiment score for various organizations. METHODS Data was collected from the Twitter handles of five pharmaceutical companies, ten U.S. and Canadian public health agencies, and World Health Organization (WHO) between January 01, 2017 - December 31, 2021. A total of 181,469 tweets were divided into two phases for the analysis: before COVID-19 and during COVID-19, based on the confirmation of the first COVID-19 community transmission case in North America on February 26, 2020. We conducted content analysis to generate health-related topics using Natural Language Processing (NLP) based topic modeling techniques, analyzed public engagement on Twitter, and performed sentiment forecasting using 16 univariate moving-average and machine learning (ML) models to understand the correlation between public opinion and tweet contents. RESULTS We utilized the topics modeled from the tweets authored by the health organizations chosen for our analysis using Non-Negative Matrix Factorization (NMF) ('c_umass' scores: -3.6530 and -3.7944, before COVID-19 and during COVID-19 respectively). The topics are - 'Chronic Diseases', 'Health Research', 'Community Healthcare', 'Medical Trials', 'COVID-19', 'Vaccination', 'Nutrition and Well-being', and 'Mental Health'. In terms of user impact, WHO (user impact: 4171.24) had the highest impact overall, followed by the public health agencies, CDC (user impact: 2895.87), and NIH (user impact: 891.06). Among pharmaceutical companies, Pfizer's user impact was the highest at 97.79. Furthermore, for sentiment forecasting, ARIMA and SARIMAX models performed best on the majority of the subsets of data (divided as per the health organization and time-period), with Mean Absolute Error (MAE) between 0.027 - 0.084, Mean Squared Error (MSE) between 0.001 - 0.011, and Root Mean Squared Error (RMSE) between 0.031 - 0.105. CONCLUSIONS Our findings indicate that people engage more on topics like 'COVID-19' than 'Medical Trials', 'Customer Experience'. Also, there are notable differences in the user engagement levels across organizations. Global organizations, like WHO, show wide variations in engagement levels over time. The sentiment forecasting method discussed presents a way for organizations to structure their future content to ensure maximum user engagement. CLINICALTRIAL
Collapse
Affiliation(s)
- Aditya Singhal
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, CA
| | - Manmeet Kaur Baxi
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, CA
| | - Vijay Mago
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, CA
| |
Collapse
|
27
|
Trivedi SK, Patra P, Singh A, Deka P, Srivastava PR. Analyzing the research trends of COVID-19 using topic modeling approach. JOURNAL OF MODELLING IN MANAGEMENT 2022. [DOI: 10.1108/jm2-02-2022-0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The COVID-19 pandemic has impacted 222 countries across the globe, with millions of people losing their lives. The threat from the virus may be assessed from the fact that most countries across the world have been forced to order partial or complete shutdown of their economies for a period of time to contain the spread of the virus. The fallout of this action manifested in loss of livelihood, migration of the labor force and severe impact on mental health due to the long duration of confinement to homes or residences.
Design/methodology/approach
The current study identifies the focus areas of the research conducted on the COVID-19 pandemic. Abstracts of papers on the subject were collated from the SCOPUS database for the period December 2019 to June 2020. The collected sample data (after preprocessing) was analyzed using Topic Modeling with Latent Dirichlet Allocation.
Findings
Based on the research papers published within the mentioned timeframe, the study identifies the 10 most prominent topics that formed the area of interest for the COVID-19 pandemic research.
Originality/value
While similar studies exist, no other work has used topic modeling to comprehensively analyze the COVID-19 literature by considering diverse fields and domains.
Collapse
|
28
|
Devi B, Preetha MMSJ. An Innovative Facial Emotion Recognition Model Enabled by Optimal Feature Selection Using Firefly Plus Jaya Algorithm. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.304399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper intents to develop an intelligent facial emotion recognition model by following four major processes like (a) Face detection (b) Feature extraction (c) Optimal feature selection and (d) Classification. In the face detection model, the face of the human is detected using the viola-Jones method. Then, the resultant face detected image is subjected to feature extraction via (a) LBP (b) DWT (c) GLCM. Further, the length of the features is large in size and hence it is essential to choose the most relevant features from the extracted image. The optimally chosen features are classified using NN. The outcome of NN portrays the type of emotions like Normal, disgust, fear, angry, smile, surprise or sad. As a novelty, this research work enhances the classification accuracy of the facial emotions by selecting the optimal features as well as optimizing the weight of NN. These both tasks are accomplished by hybridizing the concept of FF and JA together referred as MF-JFF. The resultant of NN is the accurate recognized facial emotion and the whole model is simply referred as MF-JFF-NN.
Collapse
Affiliation(s)
- Bhagyashri Devi
- Department of ECE, Noorul Islam Centre for Higher Education, India
| | | |
Collapse
|
29
|
TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments. DATA 2022. [DOI: 10.3390/data7070090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S, covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research.
Collapse
|
30
|
Rahul K, Banyal RK. k-Means Clustering with Optimal Centroid: An Optimization Insisted Model for Removing Outliers. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s0218001422590078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In data cleaning, the process of detecting and correcting corrupt, inaccurate or irrelevant records from the record set is a tedious task. Particularly, the process of “outlier detection” occupies a significant role in data cleaning that removes or eliminates the outlier’s that exist in data. Traditionally, more efforts have been taken to remove the outliers, and one of the promising ways is customizing clustering models. In this manner, this paper intends to propose a new outlier detection model via enhanced k-means with outlier removal (E-KMOR), which assigns all outliers into a group naturally during the clustering process. For assigning the point to be outliers, a new intra-cluster based distance evaluation is employed. The main contribution of this paper is to select cluster centroid optimally through a newly proposed hybrid optimization algorithm termed particle updated lion algorithm (PU-LA), which hybrids the concepts of LA and particle swarm optimization (PSO), respectively. Thereby, the proposed work is named as E-KMOR-PU-LA. Finally, the efficacy of the proposed E-KMOR-PU-LA model is proved through a comparative analysis over conventional models by concerning runtime and accuracy.
Collapse
Affiliation(s)
- Kumar Rahul
- Department of Basic and Applied Science, NIFTEM, Sonipat 131028, Haryana, India
| | - Rohitash Kumar Banyal
- Department of Computer Science and Engineering, Rajasthan Technical University, Kota 324010, Rajasthan, India
| |
Collapse
|
31
|
Jeong H, Bayro A, Umesh SP, Mamgain K, Lee M. A Perspective of COVID-19 and Healthcare: Using Social Media Data and an Aspect-based Sentiment Analysis for Usability Evaluation of a Wearable Mixed Reality Headset. JMIR Serious Games 2022; 10:e36850. [PMID: 35708916 PMCID: PMC9359310 DOI: 10.2196/36850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 05/27/2022] [Accepted: 06/12/2022] [Indexed: 12/02/2022] Open
Abstract
Background Mixed reality (MR) devices provide real-time environments for physical-digital interactions across many domains. Owing to the unprecedented COVID-19 pandemic, MR technologies have supported many new use cases in the health care industry, enabling social distancing practices to minimize the risk of contact and transmission. Despite their novelty and increasing popularity, public evaluations are sparse and often rely on social interactions among users, developers, researchers, and potential buyers. Objective The purpose of this study is to use aspect-based sentiment analysis to explore changes in sentiment during the onset of the COVID-19 pandemic as new use cases emerged in the health care industry; to characterize net insights for MR developers, researchers, and users; and to analyze the features of HoloLens 2 (Microsoft Corporation) that are helpful for certain fields and purposes. Methods To investigate the user sentiment, we collected 8492 tweets on a wearable MR headset, HoloLens 2, during the initial 10 months since its release in late 2019, coinciding with the onset of the pandemic. Human annotators rated the individual tweets as positive, negative, neutral, or inconclusive. Furthermore, by hiring an interannotator to ensure agreements between the annotators, we used various word vector representations to measure the impact of specific words on sentiment ratings. Following the sentiment classification for each tweet, we trained a model for sentiment analysis via supervised learning. Results The results of our sentiment analysis showed that the bag-of-words tokenizing method using a random forest supervised learning approach produced the highest accuracy of the test set at 81.29%. Furthermore, the results showed an apparent change in sentiment during the COVID-19 pandemic period. During the onset of the pandemic, consumer goods were severely affected, which aligns with a drop in both positive and negative sentiment. Following this, there is a sudden spike in positive sentiment, hypothesized to be caused by the new use cases of the device in health care education and training. This pandemic also aligns with drastic changes in the increased number of practical insights for MR developers, researchers, and users and positive net sentiments toward the HoloLens 2 characteristics. Conclusions Our approach suggests a simple yet effective way to survey public opinion about new hardware devices quickly. The findings of this study contribute to a holistic understanding of public perception and acceptance of MR technologies during the COVID-19 pandemic and highlight several new implementations of HoloLens 2 in health care. We hope that these findings will inspire new use cases and technological features.
Collapse
Affiliation(s)
- Heejin Jeong
- University of Illinois at Chicago, 842 West Taylor St, Chicago, US
| | - Allison Bayro
- University of Illinois at Chicago, 842 West Taylor St, Chicago, US
| | | | - Kaushal Mamgain
- University of Illinois at Chicago, 842 West Taylor St, Chicago, US
| | - Moontae Lee
- University of Illinois at Chicago, 842 West Taylor St, Chicago, US
| |
Collapse
|
32
|
Kazijevs M, Akyelken FA, Samad MD. Mining Social Media Data to Predict COVID-19 Case Counts. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2022; 2022:104-111. [PMID: 36148026 PMCID: PMC9490453 DOI: 10.1109/ichi54592.2022.00027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The unpredictability and unknowns surrounding the ongoing coronavirus disease (COVID-19) pandemic have led to an unprecedented consequence taking a heavy toll on the lives and economies of all countries. There have been efforts to predict COVID-19 case counts (CCC) using epidemiological data and numerical tokens online, which may allow early preventive measures to slow the spread of the disease. In this paper, we use state-of-the-art natural language processing (NLP) algorithms to numerically encode COVID-19 related tweets originated from eight cities in the United States and predict city-specific CCC up to eight days in the future. A city-embedding is proposed to obtain a time series representation of daily tweets posted from a city, which is then used to predict case counts using a custom long-short term memory (LSTM) model. The universal sentence encoder yields the best normalized root mean squared error (NRMSE) 0.090 (0.039), averaged across all cities in predicting CCC six days in the future. The R 2 scores in predicting CCC are more than 0.70 and often over 0.8, which suggests a strong correlation between the actual and our model predicted CCC values. Our analyses show that the NRMSE and R 2 scores are consistently robust across different cities and different numbers of time steps in time series data. Results show that the LSTM model can learn the mapping between the NLP-encoded tweet semantics and the case counts, which infers that social media text can be directly mined to identify the future course of the pandemic.
Collapse
Affiliation(s)
- Maksims Kazijevs
- Dept. of Computer Science, Tennessee State University, Nashville, TN, USA
| | - Furkan A Akyelken
- Dept. of Computer Science, Tennessee State University, Nashville, TN USA
| | - Manar D Samad
- Dept. of Computer Science, Tennessee State University, Nashville, TN USA
| |
Collapse
|
33
|
COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6020058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.
Collapse
|
34
|
Alkhaldi NA, Asiri Y, Mashraqi AM, Halawani HT, Abdel-Khalek S, Mansour RF. Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare (Basel) 2022; 10:healthcare10050910. [PMID: 35628045 PMCID: PMC9141128 DOI: 10.3390/healthcare10050910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/09/2022] [Accepted: 05/10/2022] [Indexed: 01/25/2023] Open
Abstract
The COVID-19 pandemic has been a disastrous event that has elevated several psychological issues such as depression given abrupt social changes and lack of employment. At the same time, social scientists and psychologists have gained significant interest in understanding the way people express emotions and sentiments at the time of pandemics. During the rise in COVID-19 cases with stricter lockdowns, people expressed their sentiments on social media. This offers a deep understanding of human psychology during catastrophic events. By exploiting user-generated content on social media such as Twitter, people’s thoughts and sentiments can be examined, which aids in introducing health intervention policies and awareness campaigns. The recent developments of natural language processing (NLP) and deep learning (DL) models have exposed noteworthy performance in sentiment analysis. With this in mind, this paper presents a new sunflower optimization with deep-learning-driven sentiment analysis and classification (SFODLD-SAC) on COVID-19 tweets. The presented SFODLD-SAC model focuses on the identification of people’s sentiments during the COVID-19 pandemic. To accomplish this, the SFODLD-SAC model initially preprocesses the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals. In addition, the TF-IDF model is applied for the useful extraction of features from the preprocessed data. Moreover, the cascaded recurrent neural network (CRNN) model is employed to analyze and classify sentiments. Finally, the SFO algorithm is utilized to optimally adjust the hyperparameters involved in the CRNN model. The design of the SFODLD-SAC technique with the inclusion of an SFO algorithm-based hyperparameter optimizer for analyzing people's sentiments on COVID-19 shows the novelty of this study. The simulation analysis of the SFODLD-SAC model is performed using a benchmark dataset from the Kaggle repository. Extensive, comparative results report the promising performance of the SFODLD-SAC model over recent state-of-the-art models with maximum accuracy of 99.65%.
Collapse
Affiliation(s)
- Nora A. Alkhaldi
- Department of Computer Science, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia;
| | - Yousef Asiri
- Department of Computer Science, College of Computer Science and Information Systems, Najran Univesity, Najran 61441, Saudi Arabia; (Y.A.); (A.M.M.)
| | - Aisha M. Mashraqi
- Department of Computer Science, College of Computer Science and Information Systems, Najran Univesity, Najran 61441, Saudi Arabia; (Y.A.); (A.M.M.)
| | - Hanan T. Halawani
- Department of Computer Science, College of Computer Science and Information Systems, Najran Univesity, Najran 61441, Saudi Arabia; (Y.A.); (A.M.M.)
- Correspondence:
| | - Sayed Abdel-Khalek
- Department of Mathematics, College of Science, Taif University, Taif 21944, Saudi Arabia;
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt;
| |
Collapse
|
35
|
Vahdat-Nejad H, Salmani F, Hajiabadi M, Azizi F, Abbasi S, Jamalian M, Mosafer R, Bagherzadeh P, Hajiabadi H. Extracting Feelings of People Regarding COVID-19 by Social Network Mining. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2022. [DOI: 10.1142/s0219649222400081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In 2020, COVID-19 became one of the most critical concerns in the world. This topic is even still widely discussed on all social networks. Each day, many users publish millions of tweets and comments around this subject, implicitly showing the public’s ideas and points of view regarding this subject. In this regard, to extract the public’s point of view in various countries at the early stages of this outbreak, a dataset of Coronavirus-related tweets in the English language has been collected, which consists of more than two million tweets starting from 23 March until 23 June 2020. To this end, we first use a lexicon-based approach with the GeoNames geographic database to label each tweet with its location. Next, a method based on the recently introduced and widely cited Roberta model is proposed to analyse each tweet’s sentiment. Afterwards, some analysis showing the frequency of the tweets and their sentiments is reported for each country and the world as a whole. We mainly focus on the countries with Coronavirus as a hot topic. Graph analysis shows that the frequency of the tweets for most countries is significantly correlated with the official daily statistics of COVID-19. We also discuss some other extracted knowledge that was implicit in the tweets.
Collapse
Affiliation(s)
- Hamed Vahdat-Nejad
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Fatemeh Salmani
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Mahdi Hajiabadi
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Faezeh Azizi
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Sajedeh Abbasi
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Mohadese Jamalian
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | - Reyhaneh Mosafer
- PerLab, Faculty of Electrical and Computer Engineering, University of Birjand, Iran
| | | | - Hamideh Hajiabadi
- Department of Computer Engineering, Birjand University of Technology, Iran
| |
Collapse
|
36
|
Climate Change Sentiment Analysis Using Lexicon, Machine Learning and Hybrid Approaches. SUSTAINABILITY 2022. [DOI: 10.3390/su14084723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The emissions of greenhouse gases, such as carbon dioxide, into the biosphere have the consequence of warming up the planet, hence the existence of climate change. Sentiment analysis has been a popular subject and there has been a plethora of research conducted in this area in recent decades, typically on social media platforms such as Twitter, due to the proliferation of data generated today during discussions on climate change. However, there is not much research on the performances of different sentiment analysis approaches using lexicon, machine learning and hybrid methods, particularly within this domain-specific sentiment. This study aims to find the most effective sentiment analysis approach for climate change tweets and related domains by performing a comparative evaluation of various sentiment analysis approaches. In this context, seven lexicon-based approaches were used, namely SentiWordNet, TextBlob, VADER, SentiStrength, Hu and Liu, MPQA, and WKWSCI. Meanwhile, three machine learning classifiers were used, namely Support Vector Machine, Naïve Bayes, and Logistic Regression, by using two feature extraction techniques, which were Bag-of-Words and TF–IDF. Next, the hybridization between lexicon-based and machine learning-based approaches was performed. The results indicate that the hybrid method outperformed the other two approaches, with hybrid TextBlob and Logistic Regression achieving an F1-score of 75.3%; thus, this has been chosen as the most effective approach. This study also found that lemmatization improved the accuracy of machine learning and hybrid approaches by 1.6%. Meanwhile, the TF–IDF feature extraction technique was slightly better than BoW by increasing the accuracy of the Logistic Regression classifier by 0.6%. However, TF–IDF and BoW had an identical effect on SVM and NB. Future works will include investigating the suitability of deep learning approaches toward this domain-specific sentiment on social media platforms.
Collapse
|
37
|
A Deep Learning Approach for Sentiment Analysis of COVID-19 Reviews. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083709] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
User-generated multi-media content, such as images, text, videos, and speech, has recently become more popular on social media sites as a means for people to share their ideas and opinions. One of the most popular social media sites for providing public sentiment towards events that occurred during the COVID-19 period is Twitter. This is because Twitter posts are short and constantly being generated. This paper presents a deep learning approach for sentiment analysis of Twitter data related to COVID-19 reviews. The proposed algorithm is based on an LSTM-RNN-based network and enhanced featured weighting by attention layers. This algorithm uses an enhanced feature transformation framework via the attention mechanism. A total of four class labels (sad, joy, fear, and anger) from publicly available Twitter data posted in the Kaggle database were used in this study. Based on the use of attention layers with the existing LSTM-RNN approach, the proposed deep learning approach significantly improved the performance metrics, with an increase of 20% in accuracy and 10% to 12% in precision but only 12–13% in recall as compared with the current approaches. Out of a total of 179,108 COVID-19-related tweets, tweets with positive, neutral, and negative sentiments were found to account for 45%, 30%, and 25%, respectively. This shows that the proposed deep learning approach is efficient and practical and can be easily implemented for sentiment classification of COVID-19 reviews.
Collapse
|
38
|
Tabinda Kokab S, Asghar S, Naz S. Transformer-based deep learning models for the sentiment analysis of social media data. ARRAY 2022. [DOI: 10.1016/j.array.2022.100157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
39
|
Twitter Sentiment Analysis Using Ensemble based Deep Learning Model towards COVID-19 in India and European Countries. Pattern Recognit Lett 2022; 158:164-170. [PMID: 35464347 PMCID: PMC9014659 DOI: 10.1016/j.patrec.2022.04.027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 04/06/2022] [Accepted: 04/16/2022] [Indexed: 11/22/2022]
Abstract
As of November 2021, more than 24.80 crore people are diagnosed with the coronavirus in that around 50.20 lakhs people lost their lives, because of this infectious disease. By understanding the people's sentiment's expressed in their social media (Facebook, Twitter, Instagram etc.) helps their governments in controlling, monitoring, and eradicating the coronavirus. Compared to other social media's, the twitter data are indispensable in the extraction of useful awareness information related to any crisis. In this article, a sentiment analysis model is proposed to analyze the real time tweets, which are related to coronavirus. Initially, around 3100 Indian and European people's tweets are collected between the time period of 23.03.2020 to 01.11.2021. Next, the data pre-processing and exploratory investigation are accomplished for better understanding of the collected data. Further, the feature extraction is performed using Term Frequency-Inverse Document Frequency (TF-IDF), GloVe, pre-trained Word2Vec, and fast text embedding's. The obtained feature vectors are fed to the ensemble classifier (Gated Recurrent Unit (GRU) and Capsule Neural Network (CapsNet)) for classifying the user's sentiment's as anger, sad, joy, and fear. The obtained experimental outcomes showed that the proposed model achieved 97.28% and 95.20% of prediction accuracy in classifying the both Indian and European people's sentiments.
Collapse
|
40
|
Predicting the popularity of tweets by analyzing public opinion and emotions in different stages of Covid-19 pandemic. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT DATA INSIGHTS 2022. [PMCID: PMC8677469 DOI: 10.1016/j.jjimei.2021.100053] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
In this study, public opinion and emotions regarding different stages of the Covid-19 pandemic from the outbreak of the disease to the distribution of vaccines were analyzed to predict the popularity of tweets. More than 1.25 million English tweets were collected, posted from January 20, 2020, to May 29, 2021. Five sets of content features, including topic analysis, topics plus TF-IDF vectorizer, bag of words (BOW) by TF-IDF vectorizer, document embedding, and document embedding plus TF-IDF vectorizer, were extracted and applied to supervised machine learning algorithms to generate a predictive model for the retweetability of posted tweets. The analysis showed that tweets with higher emotional intensity are more popular than tweets containing information on Covid-19 pandemic. This study can help to detect the public emotions during the pandemic and after vaccination and predict the retweetability of posted tweets in different stages of Covid-19 pandemic.
Collapse
|
41
|
Naeem MZ, Rustam F, Mehmood A, Ashraf I, Choi GS. Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms. PeerJ Comput Sci 2022; 8:e914. [PMID: 35494818 PMCID: PMC9044332 DOI: 10.7717/peerj-cs.914] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 02/12/2022] [Indexed: 06/12/2023]
Abstract
The Internet Movie Database (IMDb), being one of the popular online databases for movies and personalities, provides a wide range of movie reviews from millions of users. This provides a diverse and large dataset to analyze users' sentiments about various personalities and movies. Despite being helpful to provide the critique of movies, the reviews on IMDb cannot be read as a whole and requires automated tools to provide insights on the sentiments in such reviews. This study provides the implementation of various machine learning models to measure the polarity of the sentiments presented in user reviews on the IMDb website. For this purpose, the reviews are first preprocessed to remove redundant information and noise, and then various classification models like support vector machines (SVM), Naïve Bayes classifier, random forest, and gradient boosting classifiers are used to predict the sentiment of these reviews. The objective is to find the optimal process and approach to attain the highest accuracy with the best generalization. Various feature engineering approaches such as term frequency-inverse document frequency (TF-IDF), bag of words, global vectors for word representations, and Word2Vec are applied along with the hyperparameter tuning of the classification models to enhance the classification accuracy. Experimental results indicate that the SVM obtains the highest accuracy when used with TF-IDF features and achieves an accuracy of 89.55%. The sentiment classification accuracy of the models is affected due to the contradictions in the user sentiments in the reviews and assigned labels. For tackling this issue, TextBlob is used to assign a sentiment to the dataset containing reviews before it can be used for training. Experimental results on TextBlob assigned sentiments indicate that an accuracy of 92% can be obtained using the proposed model.
Collapse
Affiliation(s)
- Muhammad Zaid Naeem
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Furqan Rustam
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| |
Collapse
|
42
|
A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5681574. [PMID: 35281187 PMCID: PMC8906125 DOI: 10.1155/2022/5681574] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/10/2022] [Indexed: 12/20/2022]
Abstract
COVID-19 is one of the deadliest viruses, which has killed millions of people around the world to this date. The reason for peoples' death is not only linked to its infection but also to peoples' mental states and sentiments triggered by the fear of the virus. People's sentiments, which are predominantly available in the form of posts/tweets on social media, can be interpreted using two kinds of information: syntactical and semantic. Herein, we propose to analyze peoples' sentiment using both kinds of information (syntactical and semantic) on the COVID-19-related twitter dataset available in the Nepali language. For this, we, first, use two widely used text representation methods: TF-IDF and FastText and then combine them to achieve the hybrid features to capture the highly discriminating features. Second, we implement nine widely used machine learning classifiers (Logistic Regression, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, Decision Trees, Random Forest, Extreme Tree classifier, AdaBoost, and Multilayer Perceptron), based on the three feature representation methods: TF-IDF, FastText, and Hybrid. To evaluate our methods, we use a publicly available Nepali-COVID-19 tweets dataset, NepCov19Tweets, which consists of Nepali tweets categorized into three classes (Positive, Negative, and Neutral). The evaluation results on the NepCOV19Tweets show that the hybrid feature extraction method not only outperforms the other two individual feature extraction methods while using nine different machine learning algorithms but also provides excellent performance when compared with the state-of-the-art methods.
Collapse
|
43
|
Topic Modeling and Sentiment Analysis of Online Education in the COVID-19 Era Using Social Networks Based Datasets. ELECTRONICS 2022. [DOI: 10.3390/electronics11050715] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Sentiment Analysis (SA) is a technique to study people’s attitudes related to textual data generated from sources like Twitter. This study suggested a powerful and effective technique that can tackle the large contents and can specifically examine the attitudes, sentiments, and fake news of “E-learning”, which is considered a big challenge, as online textual data related to the education sector is considered of great importance. On the other hand, fake news and misinformation related to COVID-19 have confused parents, students, and teachers. An efficient detection approach should be used to gather more precise information in order to identify COVID-19 disinformation. Tweet records (people’s opinions) have gained significant attention worldwide for understanding the behaviors of people’s attitudes. SA of the COVID-19 education sector still does not provide a clear picture of the information available in these tweets, especially if this misinformation and fake news affect the field of E-learning. This study has proposed denoising AutoEncoder to eliminate noise in information, the attentional mechanism for a fusion of features as parts where a fusion of multi-level features and ELM-AE with LSTM is applied for the task of SA classification. Experiments show that our suggested approach obtains a higher F1-score value of 0.945, compared with different state-of-the-art approaches, with various sizes of testing and training datasets. Based on our knowledge, the proposed model can learn from unified features set to obtain good performance, better results than one that can be learned from the subset of features.
Collapse
|
44
|
COVID-19 Vaccination-Related Sentiments Analysis: A Case Study Using Worldwide Twitter Dataset. Healthcare (Basel) 2022; 10:healthcare10030411. [PMID: 35326889 PMCID: PMC8951387 DOI: 10.3390/healthcare10030411] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 02/05/2022] [Accepted: 02/06/2022] [Indexed: 12/23/2022] Open
Abstract
COVID-19 pandemic has caused a global health crisis, resulting in endless efforts to reduce infections, fatalities, and therapies to mitigate its after-effects. Currently, large and fast-paced vaccination campaigns are in the process to reduce COVID-19 infection and fatality risks. Despite recommendations from governments and medical experts, people show conceptions and perceptions regarding vaccination risks and share their views on social media platforms. Such opinions can be analyzed to determine social trends and devise policies to increase vaccination acceptance. In this regard, this study proposes a methodology for analyzing the global perceptions and perspectives towards COVID-19 vaccination using a worldwide Twitter dataset. The study relies on two techniques to analyze the sentiments: natural language processing and machine learning. To evaluate the performance of the different lexicon-based methods, different machine and deep learning models are studied. In addition, for sentiment classification, the proposed ensemble model named long short-term memory-gated recurrent neural network (LSTM-GRNN) is a combination of LSTM, gated recurrent unit, and recurrent neural networks. Results suggest that the TextBlob shows better results as compared to VADER and AFINN. The proposed LSTM-GRNN shows superior performance with a 95% accuracy and outperforms both machine and deep learning models. Performance analysis with state-of-the-art models proves the significance of the LSTM-GRNN for sentiment analysis.
Collapse
|
45
|
Fakhar Bilal S, Ali Almazroi A, Bashir S, Hassan Khan F, Ali Almazroi A. An ensemble based approach using a combination of clustering and classification algorithms to enhance customer churn prediction in telecom industry. PeerJ Comput Sci 2022; 8:e854. [PMID: 35494841 PMCID: PMC9044233 DOI: 10.7717/peerj-cs.854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 12/22/2021] [Indexed: 06/14/2023]
Abstract
Mobile communication has become a dominant medium of communication over the past two decades. New technologies and competitors are emerging rapidly and churn prediction has become a great concern for telecom companies. A customer churn prediction model can provide the accurate identification of potential churners so that a retention solution may be provided to them. The proposed churn prediction model is a hybrid model that is based on a combination of clustering and classification algorithms using an ensemble. First, different clustering algorithms (i.e. K-means, K-medoids, X-means and random clustering) were evaluated individually on two churn prediction datasets. Then hybrid models were introduced by combining the clusters with seven different classification algorithms individually and then evaluations were performed using ensembles. The proposed research was evaluated on two different benchmark telecom data sets obtained from GitHub and Bigml platforms. The analysis of results indicated that the proposed model attained the highest prediction accuracy of 94.7% on the GitHub dataset and 92.43% on the Bigml dataset. State of the art comparison was also performed using the proposed model. The proposed model performed significantly better than state of the art churn prediction models.
Collapse
Affiliation(s)
- Syed Fakhar Bilal
- Computer Science Department, Federal Urdu University of Arts, Science and Technology, Islamabad, Pakistan
| | - Abdulwahab Ali Almazroi
- University of Jeddah, College of Computing and Information Technology at Khulais, Department of Information Technology, Jeddah, Saudi Arabia
| | - Saba Bashir
- Computer Science Department, Federal Urdu University of Arts, Science and Technology, Islamabad, Pakistan
| | - Farhan Hassan Khan
- Knowledge & Data Science Research Center (KDRC), Computer Engineering Department, National University of Science and Technology, Islamabad, Pakistan
| | - Abdulaleem Ali Almazroi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh, Saudi Arabia
| |
Collapse
|
46
|
Jain P, Deshmukh SP. CC-LA: determining optimal switching angles in a cascaded H-bridge multilevel inverter with the aid of binary cat cubpool-based lion algorithm. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:12399-12413. [PMID: 34089163 DOI: 10.1007/s11356-021-14220-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 04/27/2021] [Indexed: 06/12/2023]
Abstract
A multilevel inverter (MLI) is a power electronic device that includes the capability to offer the preferred voltage level (alternating) in the output. Accordingly, selective harmonic elimination (SHE) or pulse width modulation (PWM) methodologies were deployed widely. However, these techniques are not appropriate in a certain situation which involves a huge number of switching angles if an excellent primary guess is not obtainable. Hence, this paper intends to determine the optimum switching angles of a cascaded H-bridge multilevel inverter (CH-MLI) using a hybrid lion optimization algorithm (LOA) and binary cat swarm algorithm (BCSO). The objective of optimizing the switching angle is to generate the needed fundamental voltage and minimize the harmonic content. This is done by resolving the transcendental equations characterizing the harmonic content. The switching angles, i.e., α1, α2.... αm, should be tuned in such a way that it satisfies the condition [Formula: see text]. Here, the optimal tuning of switching angles is done by the proposed binary cat cubpool-based lion algorithm (BCC-LA). In addition, the analysis is done for the proposed method over the state-of-the-art models in terms of total harmonic distortion (THD), and the impact of varying loads is also examined for the proposed and traditional models, and thus, the superior performance of the proposed model is validated.
Collapse
Affiliation(s)
- Pragya Jain
- Atharva College of Engineering, Mumbai, India.
| | | |
Collapse
|
47
|
Jalil Z, Abbasi A, Javed AR, Badruddin Khan M, Abul Hasanat MH, Malik KM, Saudagar AKJ. COVID-19 Related Sentiment Analysis Using State-of-the-Art Machine Learning and Deep Learning Techniques. Front Public Health 2022; 9:812735. [PMID: 35096755 PMCID: PMC8795663 DOI: 10.3389/fpubh.2021.812735] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 12/15/2021] [Indexed: 12/22/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has influenced the everyday life of people around the globe. In general and during lockdown phases, people worldwide use social media network to state their viewpoints and general feelings concerning the pandemic that has hampered their daily lives. Twitter is one of the most commonly used social media platforms, and it showed a massive increase in tweets related to coronavirus, including positive, negative, and neutral tweets, in a minimal period. The researchers move toward the sentiment analysis and analyze the various emotions of the public toward COVID-19 due to the diverse nature of tweets. Meanwhile, people have expressed their feelings regarding the vaccinations' safety and effectiveness on social networking sites such as Twitter. As an advanced step, in this paper, our proposed approach analyzes COVID-19 by focusing on Twitter users who share their opinions on this social media networking site. The proposed approach analyzes collected tweets' sentiments for sentiment classification using various feature sets and classifiers. The early detection of COVID-19 sentiments from collected tweets allow for a better understanding and handling of the pandemic. Tweets are categorized into positive, negative, and neutral sentiment classes. We evaluate the performance of machine learning (ML) and deep learning (DL) classifiers using evaluation metrics (i.e., accuracy, precision, recall, and F1-score). Experiments prove that the proposed approach provides better accuracy of 96.66, 95.22, 94.33, and 93.88% for COVISenti, COVIDSenti_A, COVIDSenti_B, and COVIDSenti_C, respectively, compared to all other methods used in this study as well as compared to the existing approaches and traditional ML and DL algorithms.
Collapse
Affiliation(s)
- Zunera Jalil
- Department of Cyber Security, Air University, Islamabad, Pakistan
| | - Ahmed Abbasi
- Department of Cyber Security, Air University, Islamabad, Pakistan
| | | | - Muhammad Badruddin Khan
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| | - Mozaherul Hoque Abul Hasanat
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| | - Khalid Mahmood Malik
- Department of Computer Science and Engineering, Oakland University Rochester, Rochester, MI, United States
| | - Abdul Khader Jilani Saudagar
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| |
Collapse
|
48
|
Luu T(JP, Follmann R. The relationship between sentiment score and COVID-19 cases in the United States. J Inf Sci 2022. [DOI: 10.1177/01655515211068167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The coronavirus disease (COVID-19) continues to have devastating effects across the globe. No nation has been free from the uncertainty brought by this pandemic. The health, social and economic tolls associated with it are causing strong emotions and spreading fear in people of all ages, genders and races. Since the beginning of the COVID-19 pandemic, many have expressed their feelings and opinions related to a wide range of aspects of their lives via Twitter. In this study, we consider a framework for extracting sentiment scores and opinions from COVID-19–related tweets. We connect users’ sentiment with COVID-19 cases across the United States and investigate the effect of specific COVID-19 milestones on public sentiment. The results of this work may help with the development of pandemic-related legislation, serve as a guide for scientific work, as well as inform and educate the public on core issues related to the pandemic.
Collapse
|
49
|
Galgoczy MC, Phatak A, Vinson D, Mago VK, Giabbanelli PJ. (Re)shaping online narratives: when bots promote the message of President Trump during his first impeachment. PeerJ Comput Sci 2022; 8:e947. [PMID: 35494820 PMCID: PMC9044321 DOI: 10.7717/peerj-cs.947] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/23/2022] [Indexed: 05/12/2023]
Abstract
Influencing and framing debates on Twitter provides power to shape public opinion. Bots have become essential tools of 'computational propaganda' on social media such as Twitter, often contributing to a large fraction of the tweets regarding political events such as elections. Although analyses have been conducted regarding the first impeachment of former president Donald Trump, they have been focused on either a manual examination of relatively few tweets to emphasize rhetoric, or the use of Natural Language Processing (NLP) of a much larger corpus with respect to common metrics such as sentiment. In this paper, we complement existing analyses by examining the role of bots in the first impeachment with respect to three questions as follows. (Q1) Are bots actively involved in the debate? (Q2) Do bots target one political affiliation more than another? (Q3) Which sources are used by bots to support their arguments? Our methods start with collecting over 13M tweets on six key dates, from October 6th 2019 to January 21st 2020. We used machine learning to evaluate the sentiment of the tweets (via BERT) and whether it originates from a bot. We then examined these sentiments with respect to a balanced sample of Democrats and Republicans directly relevant to the impeachment, such as House Speaker Nancy Pelosi, senator Mitch McConnell, and (then former Vice President) Joe Biden. The content of posts from bots was further analyzed with respect to the sources used (with bias ratings from AllSides and Ad Fontes) and themes. Our first finding is that bots have played a significant role in contributing to the overall negative tone of the debate (Q1). Bots were targeting Democrats more than Republicans (Q2), as evidenced both by a difference in ratio (bots had more negative-to-positive tweets on Democrats than Republicans) and in composition (use of derogatory nicknames). Finally, the sources provided by bots were almost twice as likely to be from the right than the left, with a noticeable use of hyper-partisan right and most extreme right sources (Q3). Bots were thus purposely used to promote a misleading version of events. Overall, this suggests an intentional use of bots as part of a strategy, thus providing further confirmation that computational propaganda is involved in defining political events in the United States. As any empirical analysis, our work has several limitations. For example, Trump's rhetoric on Twitter has previously been characterized by an overly negative tone, thus tweets detected as negative may be echoing his message rather than acting against him. Previous works show that this possibility is limited, and its existence would only strengthen our conclusions. As our analysis is based on NLP, we focus on processing a large volume of tweets rather than manually reading all of them, thus future studies may complement our approach by using qualitative methods to assess the specific arguments used by bots.
Collapse
Affiliation(s)
- Michael C. Galgoczy
- Department of Computer Science & Software Engineering, Miami University, Oxford, OH, United States
| | - Atharva Phatak
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Danielle Vinson
- Department of Politics & International Affairs, Furman University, Greenville, SC, United States
| | - Vijay K. Mago
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Philippe J. Giabbanelli
- Department of Computer Science & Software Engineering, Miami University, Oxford, OH, United States
| |
Collapse
|
50
|
Beliga S, Martinčić-Ipšić S, Matešić M, Petrijevčanin Vuksanović I, Meštrović A. Infoveillance of the Croatian Online Media During the COVID-19 Pandemic: One-Year Longitudinal Study Using Natural Language Processing. JMIR Public Health Surveill 2021; 7:e31540. [PMID: 34739388 PMCID: PMC8715984 DOI: 10.2196/31540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/08/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Online media play an important role in public health emergencies and serve as essential communication platforms. Infoveillance of online media during the COVID-19 pandemic is an important step toward gaining a better understanding of crisis communication. OBJECTIVE The goal of this study was to perform a longitudinal analysis of the COVID-19-related content on online media based on natural language processing. METHODS We collected a data set of news articles published by Croatian online media during the first 13 months of the pandemic. First, we tested the correlations between the number of articles and the number of new daily COVID-19 cases. Second, we analyzed the content by extracting the most frequent terms and applied the Jaccard similarity coefficient. Third, we compared the occurrence of the pandemic-related terms during the two waves of the pandemic. Finally, we applied named entity recognition to extract the most frequent entities and tracked the dynamics of changes during the observation period. RESULTS The results showed no significant correlation between the number of articles and the number of new daily COVID-19 cases. Furthermore, there were high overlaps in the terminology used in all articles published during the pandemic with a slight shift in the pandemic-related terms between the first and the second waves. Finally, the findings indicate that the most influential entities have lower overlaps for the identified people and higher overlaps for locations and institutions. CONCLUSIONS Our study shows that online media have a prompt response to the pandemic with a large number of COVID-19-related articles. There was a high overlap in the frequently used terms across the first 13 months, which may indicate the narrow focus of reporting in certain periods. However, the pandemic-related terminology is well-covered.
Collapse
Affiliation(s)
- Slobodan Beliga
- Department of Informatics, University of Rijeka, Rijeka, Croatia
- Center for Artificial lntelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
| | - Sanda Martinčić-Ipšić
- Department of Informatics, University of Rijeka, Rijeka, Croatia
- Center for Artificial lntelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
| | - Mihaela Matešić
- Center for Artificial lntelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
- Faculty of Humanities and Social Sciences, University of Rijeka, Rijeka, Croatia
| | | | - Ana Meštrović
- Department of Informatics, University of Rijeka, Rijeka, Croatia
- Center for Artificial lntelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
| |
Collapse
|