1
|
Papia SK, Khan MA, Habib T, Rahman M, Islam MN. DistilRoBiLSTMFuse: an efficient hybrid deep learning approach for sentiment analysis. PeerJ Comput Sci 2024; 10:e2349. [PMID: 39650469 PMCID: PMC11623128 DOI: 10.7717/peerj-cs.2349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 08/31/2024] [Indexed: 12/11/2024]
Abstract
In today's modern society, social media has seamlessly integrated into our daily routines, providing a platform for individuals to express their opinions and emotions openly on the internet. Within this digital domain, sentiment analysis (SA) is a vital tool to understand the emotions conveyed in written text, whether positive, negative, or neutral. However, SA faces challenges such as dealing with diverse language, uneven data, and understanding complex sentences. This study proposes an effective approach for SA. For this, we introduce a hybrid architecture named DistilRoBiLSTMFuse, designed to extract deep contextual information from complex sentences and accurately identify sentiments. In this research, we evaluate our model's performance using two popular benchmark datasets: IMDb and Twitter USAirline sentiment. The raw text data are preprocessed, and this involves several steps, including: (1) implementing a comprehensive data cleaning protocol to remove noise and unnecessary information from the raw text, (2) preparing a custom list of stopwords to retain essential words while omitting common, non-informative words, and (3) applying Lemmatization to achieve consistency in text by reducing words to their base forms, enhancing the accuracy of text analysis. To address class imbalance, this study utilized oversampling, augmenting minority class samples to match the majority, thereby ensuring uniform representation across all categories. Considering the variability in preprocessing techniques across previous studies, our research initially explores the efficacy of seven distinct machine learning (ML) models paired with two commonly employed feature transformation methods: term frequency-inverse document frequency (TF-IDF) and bag of words (BoW). This approach allows for determining which combination yields optimal performance within these ML frameworks. In our study, the DistilRoBiLSTMFuse model is evaluated on two distinct datasets and consistently delivers outstanding performance, surpassing existing state-of-the-art approaches in each case. On the IMDb dataset, our model achieves 98.91% accuracy in training, 94.16% in validation, and 93.97% in testing. The Twitter USAirline Sentiment dataset reaches 99.42% accuracy in training, 98.52% in validation, and 98.33% in testing. The experimental results clearly demonstrate the effectiveness of our hybrid DistilRoBiLSTMFuse model in SA tasks. The code for this experimental analysis is publicly available and can be accessed via the following DOI: https://doi.org/10.5281/zenodo.13255008.
Collapse
Affiliation(s)
- Sonia Khan Papia
- Information Technology, Washington University of Science & Technology, Alexandria, VA, United States of America
| | - Md Asif Khan
- International Relations, University of Dhaka, Dhaka, Bangladesh
| | - Tanvir Habib
- International Relations, University of Dhaka, Dhaka, Bangladesh
| | - Mizanur Rahman
- School of Computer Science, Western Illinois University, Macomb, IL, United States of America
| | - Md Nahidul Islam
- Faculty of Electrical and Electronic Engineering, Universiti Malaysia Pahang Al-Sultan Abdullah, PEAKN, Malaysia
| |
Collapse
|
2
|
Levett JJ, Elkaim LM, Weber MH, Yuh SJ, Lasry O, Alotaibi NM, Georgiopoulos M, Berven SH, Weil AG. A twitter analysis of patient and family experience in pediatric spine surgery. Childs Nerv Syst 2023; 39:3483-3490. [PMID: 37354288 DOI: 10.1007/s00381-023-06019-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 06/04/2023] [Indexed: 06/26/2023]
Abstract
BACKGROUND There is little data on patient and caregiver perceptions of spine surgery in children and youth. This study aims to characterize the personal experiences of patients, caregivers, and family members surrounding pediatric spine surgery through a qualitative and quantitative social media analysis. METHODS The Twitter application programming interface was searched for keywords related to pediatric spine surgery from inception to March 2022. Relevant tweets and accounts were extracted and subsequently classified using thematic labels. Tweet metadata was collected to measure user engagement via multivariable regression. Sentiment analysis using Natural Language Processing was performed on all tweets with a focus on tweets discussing the personal experiences of patients and caregivers. RESULTS 2424 tweets from 1847 individual accounts were retrieved for analysis. Patients and caregivers represented 1459 (79.0%) of all accounts. Posts discussed the personal experiences of patients and caregivers in 83.5% of tweets. Pediatric spine surgery research was discussed in few posts (n=90, 3.7%). Within the personal experience category, 975 (48.17%) tweets were positive, 516 (25.49%) were negative, and 533 (26.34%) were neutral. Presence of a tag (beta: -6.1, 95% CI -9.7 to -2.5) and baseline follower count (beta<0.001, 95% CI <0.001 to <0.001) significantly affected tweet engagement negatively and positively, respectively. CONCLUSIONS Patients and caregivers actively discuss topics related to pediatric spine surgery on Twitter. Posts discussing personal experience are most prevalent, while posts on research are scarce, unlike previous social media studies. Pediatric spine surgeons can leverage this dialogue to better understand the worries and needs of patients and their families.
Collapse
Affiliation(s)
- Jordan J Levett
- Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Lior M Elkaim
- Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec, Canada.
| | - Michael H Weber
- Department of Orthopaedic Surgery, McGill University, Montreal, Quebec, Canada
| | - Sung-Joo Yuh
- Department of Neurosurgery, Centre Hospitalier de l'Université de Montréal, Montreal, Quebec, Canada
| | - Oliver Lasry
- Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec, Canada
- Department of Epidemiology, and Occupational Health, McGill University, BiostatisticsMontreal, Quebec, Canada
| | - Naif M Alotaibi
- Department of Neurosurgery, King Fahad Medical City, National Neuroscience Institute, Riyadh, Saudi Arabia
| | | | - Sigurd H Berven
- Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, California, United States
| | - Alexander G Weil
- Centre Hospitalier Universitaire Sainte-Justine, Montreal, Quebec, Canada
| |
Collapse
|
3
|
Kaminska O, Cornelis C, Hoste V. Fuzzy Rough Nearest Neighbour Methods for Detecting Emotions, Hate Speech and Irony. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
4
|
Shamoi E, Turdybay A, Shamoi P, Akhmetov I, Jaxylykova A, Pak A. Sentiment analysis of vegan related tweets using mutual information for feature selection. PeerJ Comput Sci 2022; 8:e1149. [PMID: 36532810 PMCID: PMC9748844 DOI: 10.7717/peerj-cs.1149] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/17/2022] [Indexed: 06/17/2023]
Abstract
Nowadays, people get increasingly attached to social media to connect with other people, to study, and to work. The presented article uses Twitter posts to better understand public opinion regarding the vegan (plant-based) diet that has traditionally been portrayed negatively on social media. However, in recent years, studies on health benefits, COVID-19, and global warming have increased the awareness of plant-based diets. The study employs a dataset derived from a collection of vegan-related tweets and uses a sentiment analysis technique for identifying the emotions represented in them. The purpose of sentiment analysis is to determine whether a piece of text (tweet in our case) conveys a negative or positive viewpoint. We use the mutual information approach to perform feature selection in this study. We chose this method because it is suitable for mining the complicated features from vegan tweets and extracting users' feelings and emotions. The results revealed that the vegan diet is becoming more popular and is currently framed more positively than in previous years. However, the emotions of fear were mostly strong throughout the period, which is in sharp contrast to other types of emotions. Our findings place new information in the public domain, which has significant implications. The article provides evidence that the vegan trend is growing and new insights into the key emotions associated with this growth from 2010 to 2022. By gaining a deeper understanding of the public perception of veganism, medical experts can create appropriate health programs and encourage more people to stick to a healthy vegan diet. These results can be used to devise appropriate government action plans to promote healthy veganism and reduce the associated emotion of fear.
Collapse
Affiliation(s)
- Elvina Shamoi
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
| | - Akniyet Turdybay
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
| | - Pakizar Shamoi
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
| | - Iskander Akhmetov
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
- Institute of Information and Computational Technologies, Almaty, Kazakhstan
| | - Assel Jaxylykova
- Institute of Information and Computational Technologies, Almaty, Kazakhstan
- Kazakh National University, Almaty, Kazakhstan
| | - Alexandr Pak
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
- Institute of Information and Computational Technologies, Almaty, Kazakhstan
| |
Collapse
|
5
|
Yenkikar A, Babu CN, Hemanth DJ. Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble. PeerJ Comput Sci 2022; 8:e1100. [PMID: 36262147 PMCID: PMC9575864 DOI: 10.7717/peerj-cs.1100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
The exponential rise in social media via microblogging sites like Twitter has sparked curiosity in sentiment analysis that exploits user feedback towards a targeted product or service. Considering its significance in business intelligence and decision-making, numerous efforts have been made in this area. However, lack of dictionaries, unannotated data, large-scale unstructured data, and low accuracies have plagued these approaches. Also, sentiment classification through classifier ensemble has been underexplored in literature. In this article, we propose a Semantic Relational Machine Learning (SRML) model that automatically classifies the sentiment of tweets by using classifier ensemble and optimal features. The model employs the Cascaded Feature Selection (CFS) strategy, a novel statistical assessment approach based on Wilcoxon rank sum test, univariate logistic regression assisted significant predictor test and cross-correlation test. It further uses the efficacy of word2vec-based continuous bag-of-words and n-gram feature extraction in conjunction with SentiWordNet for finding optimal features for classification. We experiment on six public Twitter sentiment datasets, the STS-Gold dataset, the Obama-McCain Debate (OMD) dataset, the healthcare reform (HCR) dataset and the SemEval2017 Task 4A, 4B and 4C on a heterogeneous classifier ensemble comprising fourteen individual classifiers from different paradigms. Results from the experimental study indicate that CFS supports in attaining a higher classification accuracy with up to 50% lesser features compared to count vectorizer approach. In Intra-model performance assessment, the Artificial Neural Network-Gradient Descent (ANN-GD) classifier performs comparatively better than other individual classifiers, but the Best Trained Ensemble (BTE) strategy outperforms on all metrics. In inter-model performance assessment with existing state-of-the-art systems, the proposed model achieved higher accuracy and outperforms more accomplished models employing quantum-inspired sentiment representation (QSR), transformer-based methods like BERT, BERTweet, RoBERTa and ensemble techniques. The research thus provides critical insights into implementing similar strategy into building more generic and robust expert system for sentiment analysis that can be leveraged across industries.
Collapse
Affiliation(s)
- Anuradha Yenkikar
- Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
| | - C. Narendra Babu
- Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
| | - D. Jude Hemanth
- Department of Electronics and Communications Engineering, Karunya University, Coimbatore, Tamil Nadu, India
| |
Collapse
|