1
|
Emegano DI, Duwa BB, Usman AG, Ahmad H, Ozsahin DU, Askar S. A comparative study on TB incidence and HIVTB coinfection using machine learning models on WHO global TB dataset. Sci Rep 2025; 15:13690. [PMID: 40258881 PMCID: PMC12012007 DOI: 10.1038/s41598-025-94378-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 03/13/2025] [Indexed: 04/23/2025] Open
Abstract
Tuberculosis, a deadly and contagious disease caused by Mycobacterium tuberculosis, remains a significant global public health threat. HIV co-infection significantly increases the risk of active TB recurrence and prolongs medical treatment for tuberculosis (TB). The study focuses on using advanced machine learning (ML) techniques to predict TB incidence and HIV-TB co-infection using data from the 2023 World Health Organization (WHO) Global TB burden database. The estimated rate for all types of tuberculosis per 100,000 people (E_inc_100k) and the estimated rate of HIV-positive tuberculosis incidence per 100,000 people (e_inc_tbhiv_100k) are the two main goal factors in the dataset. F1 score, accuracy, precision, recall, and the Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) were among the important metrics used to evaluate the model's performance. With 99.7% accuracy, 99.80% precision, 99.6% recall, a 99.7% F1 score, and a 99.7% ROC-AUC score, the Extreme Gradient Boosting (XGB) model outperformed other models for e_inc_100k. The e_inc_tbhiv_100k records outstanding performance from the Gradient Boosting (GB) model, with 98.58% accuracy, 98.32% precision, 98.73% recall, a 98.53% F1 score, and a 98.58% ROC-AUC score. Finally, the study aligns with the UNAIDS and WHO End TB Strategy, indicating a progression in combating TB and TB-HIV co-infection in public health workflow.
Collapse
Affiliation(s)
- Declan I Emegano
- Operational Research Center in Healthcare, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey.
- Department of Biomedical Engineering, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey.
| | - Basil B Duwa
- Operational Research Center in Healthcare, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey
- Department of Biomedical Engineering, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey
| | - A G Usman
- Operational Research Center in Healthcare, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey
| | - Hijaz Ahmad
- Operational Research Center in Healthcare, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey
- Department of Mathematics, College of Science, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
- Department of Technical Sciences, Western Caspian University, Baku 1001, Azerbaijan
| | - Dilber Uzun Ozsahin
- Operational Research Center in Healthcare, Near East University, Nicosia/TRNC, Mersin 10, 99138, Turkey
- Department of Medical Diagnostic Imaging, College of Health Science, University of Sharjah, Sharjah, UAE
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah, UAE
| | - Sameh Askar
- Department of Statistics and Operations Research, College of Science, King Saud University, PO Box 2455, Riyadh 11451, Saudi Arabia
| |
Collapse
|
2
|
Towards improving e-commerce customer review analysis for sentiment detection. Sci Rep 2022; 12:21983. [PMID: 36539524 PMCID: PMC9764295 DOI: 10.1038/s41598-022-26432-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/14/2022] [Indexed: 12/24/2022] Open
Abstract
According to a report published by Business Wire, the market value of e-commerce reached US$ 13 trillion and is expected to reach US$ 55.6 trillion by 2027. In this rapidly growing market, product and service reviews can influence our purchasing decisions. It is challenging to manually evaluate reviews to make decisions and examine business models. However, users can examine and automate this process with Natural Language Processing (NLP). NLP is a well-known technique for evaluating and extracting information from written or audible texts. NLP research investigates the social architecture of societies. This article analyses the Amazon dataset using various combinations of voice components and deep learning. The suggested module focuses on identifying sentences as 'Positive', 'Neutral', 'Negative', or 'Indifferent'. It analyses the data and labels the 'better' and 'worse' assumptions as positive and negative, respectively. With the expansion of the internet and e-commerce websites over the past decade, consumers now have a vast selection of products within the same domain, and NLP plays a vital part in classifying products based on evaluations. It is possible to predict sponsored and unpaid reviews using NLP with Machine Learning. This article examined various Machine Learning algorithms for predicting the sentiment of e-commerce website reviews. The automation achieves a maximum validation accuracy of 79.83% when using Fast Text as word embedding and the Multi-channel Convolution Neural Network.
Collapse
|
3
|
Pair E, Vicas N, Weber AM, Meausoone V, Zou J, Njuguna A, Darmstadt GL. Quantification of Gender Bias and Sentiment Toward Political Leaders Over 20 Years of Kenyan News Using Natural Language Processing. Front Psychol 2021; 12:712646. [PMID: 34955949 PMCID: PMC8703202 DOI: 10.3389/fpsyg.2021.712646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/17/2021] [Indexed: 11/16/2022] Open
Abstract
Background: Despite a 2010 Kenyan constitutional amendment limiting members of elected public bodies to < two-thirds of the same gender, only 22 percent of the 12th Parliament members inaugurated in 2017 were women. Investigating gender bias in the media is a useful tool for understanding socio-cultural barriers to implementing legislation for gender equality. Natural language processing (NLP) methods, such as word embedding and sentiment analysis, can efficiently quantify media biases at a scope previously unavailable in the social sciences. Methods: We trained GloVe and word2vec word embeddings on text from 1998 to 2019 from Kenya’s Daily Nation newspaper. We measured gender bias in these embeddings and used sentiment analysis to predict quantitative sentiment scores for sentences surrounding female leader names compared to male leader names. Results: Bias in leadership words for men and women measured from Daily Nation word embeddings corresponded to temporal trends in men and women’s participation in political leadership (i.e., parliamentary seats) using GloVe (correlation 0.8936, p = 0.0067, r2 = 0.799) and word2vec (correlation 0.844, p = 0.0169, r2 = 0.712) algorithms. Women continue to be associated with domestic terms while men continue to be associated with influence terms, for both regular gender words and female and male political leaders’ names. Male words (e.g., he, him, man) were mentioned 1.84 million more times than female words from 1998 to 2019. Sentiment analysis showed an increase in relative negative sentiment associated with female leaders (p = 0.0152) and an increase in positive sentiment associated with male leaders over time (p = 0.0216). Conclusion: Natural language processing is a powerful method for gaining insights into and quantifying trends in gender biases and sentiment in news media. We found evidence of improvement in gender equality but also a backlash from increased female representation in high-level governmental leadership.
Collapse
Affiliation(s)
- Emma Pair
- Department of Pediatrics, Global Center for Gender Equality, School of Medicine, Stanford University, Stanford, CA, United States
| | - Nikitha Vicas
- Department of Neuroscience, University of Texas - Dallas, Dallas, TX, United States
| | - Ann M Weber
- School of Public Health, University of Nevada, Reno, NV, United States
| | - Valerie Meausoone
- Research Computing Center, Stanford University, Stanford, CA, United States
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Amos Njuguna
- School of Graduate Studies, Research and Extension, United States International University - Africa, Nairobi, Kenya
| | - Gary L Darmstadt
- Department of Pediatrics, Global Center for Gender Equality, School of Medicine, Stanford University, Stanford, CA, United States
| |
Collapse
|
4
|
Chinnalagu A, Durairaj AK. Context-based sentiment analysis on customer reviews using machine learning linear models. PeerJ Comput Sci 2021; 7:e813. [PMID: 35036535 PMCID: PMC8725657 DOI: 10.7717/peerj-cs.813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 11/22/2021] [Indexed: 06/14/2023]
Abstract
Customer satisfaction and their positive sentiments are some of the various goals for successful companies. However, analyzing customer reviews to predict accurate sentiments have been proven to be challenging and time-consuming due to high volumes of collected data from various sources. Several researchers approach this with algorithms, methods, and models. These include machine learning and deep learning (DL) methods, unigram and skip-gram based algorithms, as well as the Artificial Neural Network (ANN) and bag-of-word (BOW) regression model. Studies and research have revealed incoherence in polarity, model overfitting and performance issues, as well as high cost in data processing. This experiment was conducted to solve these revealing issues, by building a high performance yet cost-effective model for predicting accurate sentiments from large datasets containing customer reviews. This model uses the fastText library from Facebook's AI research (FAIR) Lab, as well as the traditional Linear Support Vector Machine (LSVM) to classify text and word embedding. Comparisons of this model were also done with the author's a custom multi-layer Sentiment Analysis (SA) Bi-directional Long Short-Term Memory (SA-BLSTM) model. The proposed fastText model, based on results, obtains a higher accuracy of 90.71% as well as 20% in performance compared to LSVM and SA-BLSTM models.
Collapse
Affiliation(s)
- Anandan Chinnalagu
- Computer Science, Government Arts College (Affiliated to Bharathidasan University, Tiruchirappalli), Kulithalai, Karur, Tamil Nadu, India
| | - Ashok Kumar Durairaj
- Computer Science, Government Arts College (Affiliated to Bharathidasan University, Tiruchirappalli), Kulithalai, Karur, Tamil Nadu, India
| |
Collapse
|