1
|
Giannini F, Marelli M, Stella F, Monzani D, Pancani L. Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts. J Pers 2024; 92:1602-1615. [PMID: 38217359 DOI: 10.1111/jopy.12915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/11/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVE We aimed to develop a machine learning model to infer OCEAN traits from text. BACKGROUND The psycholexical approach allows retrieving information about personality traits from human language. However, it has rarely been applied because of methodological and practical issues that current computational advancements could overcome. METHOD Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity. RESULTS Intrinsic validation of the model yielded excellent results, with R2 values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness. CONCLUSIONS This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.
Collapse
Affiliation(s)
- Federico Giannini
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Marco Marelli
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| | - Fabio Stella
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Dario Monzani
- Department of Psychology, Educational Science and Human Movement, University of Palermo, Palermo, Italy
| | - Luca Pancani
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| |
Collapse
|
2
|
Shahmohammadi H, Heitmeier M, Shafaei-Bajestan E, Lensch HPA, Baayen RH. Language with vision: A study on grounded word and sentence embeddings. Behav Res Methods 2024; 56:5622-5646. [PMID: 38114881 PMCID: PMC11335852 DOI: 10.3758/s13428-023-02294-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2023] [Indexed: 12/21/2023]
Abstract
Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioral datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT (Devlin et al, 2018), but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at ( https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2 ).
Collapse
|
4
|
Alamoodi AH, Zaidan BB, Zaidan AA, Albahri OS, Mohammed KI, Malik RQ, Almahdi EM, Chyad MA, Tareq Z, Albahri AS, Hameed H, Alaa M. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. EXPERT SYSTEMS WITH APPLICATIONS 2021; 167:114155. [PMID: 33139966 PMCID: PMC7591875 DOI: 10.1016/j.eswa.2020.114155] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 10/23/2020] [Accepted: 10/23/2020] [Indexed: 05/05/2023]
Abstract
The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 occurred unexpectedly in China in December 2019. Tens of millions of confirmed cases and more than hundreds of thousands of confirmed deaths are reported worldwide according to the World Health Organisation. News about the virus is spreading all over social media websites. Consequently, these social media outlets are experiencing and presenting different views, opinions and emotions during various outbreak-related incidents. For computer scientists and researchers, big data are valuable assets for understanding people's sentiments regarding current events, especially those related to the pandemic. Therefore, analysing these sentiments will yield remarkable findings. To the best of our knowledge, previous related studies have focused on one kind of infectious disease. No previous study has examined multiple diseases via sentiment analysis. Accordingly, this research aimed to review and analyse articles about the occurrence of different types of infectious diseases, such as epidemics, pandemics, viruses or outbreaks, during the last 10 years, understand the application of sentiment analysis and obtain the most important literature findings. Articles on related topics were systematically searched in five major databases, namely, ScienceDirect, PubMed, Web of Science, IEEE Xplore and Scopus, from 1 January 2010 to 30 June 2020. These indices were considered sufficiently extensive and reliable to cover our scope of the literature. Articles were selected based on our inclusion and exclusion criteria for the systematic review, with a total of n = 28 articles selected. All these articles were formed into a coherent taxonomy to describe the corresponding current standpoints in the literature in accordance with four main categories: lexicon-based models, machine learning-based models, hybrid-based models and individuals. The obtained articles were categorised into motivations related to disease mitigation, data analysis and challenges faced by researchers with respect to data, social media platforms and community. Other aspects, such as the protocol being followed by the systematic review and demographic statistics of the literature distribution, were included in the review. Interesting patterns were observed in the literature, and the identified articles were grouped accordingly. This study emphasised the current standpoint and opportunities for research in this area and promoted additional efforts towards the understanding of this research field.
Collapse
Affiliation(s)
- A H Alamoodi
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - B B Zaidan
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
- Future Technology Research Center, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan, ROC
| | - A A Zaidan
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - O S Albahri
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - K I Mohammed
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - R Q Malik
- Department of Engineering Technology, Universiti Tun Hussein Onn (UTHM), Batu Pahat, Malaysia
| | - E M Almahdi
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - M A Chyad
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - Z Tareq
- Department of Computer Science, Computer Science and Mathematics College, Tikrit University, Tikrit 34001, Iraq
| | - A S Albahri
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq
| | - Hamsa Hameed
- Faculty of Human Development, Sultan Idris University of Education (UPSI), Tanjung Malim, Malaysia
| | - Musaab Alaa
- Faculty of Languages and Communication, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| |
Collapse
|