1
|
Petilli MA, Marelli M, Mazzoni G, Marchetti M, Rinaldi L, Gatti D. From vector spaces to DRM lists: False Memory Generator, a software for automated generation of lists of stimuli inducing false memories. Behav Res Methods 2024:10.3758/s13428-024-02425-0. [PMID: 38710986 DOI: 10.3758/s13428-024-02425-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2024] [Indexed: 05/08/2024]
Abstract
The formation of false memories is one of the most widely studied topics in cognitive psychology. The Deese-Roediger-McDermott (DRM) paradigm is a powerful tool for investigating false memories and revealing the cognitive mechanisms subserving their formation. In this task, participants first memorize a list of words (encoding phase) and next have to indicate whether words presented in a new list were part of the initially memorized one (recognition phase). By employing DRM lists optimized to investigate semantic effects, previous studies highlighted a crucial role of semantic processes in false memory generation, showing that new words semantically related to the studied ones tend to be more erroneously recognized (compared to new words less semantically related). Despite the strengths of the DRM task, this paradigm faces a major limitation in list construction due to its reliance on human-based association norms, posing both practical and theoretical concerns. To address these issues, we developed the False Memory Generator (FMG), an automated and data-driven tool for generating DRM lists, which exploits similarity relationships between items populating a vector space. Here, we present FMG and demonstrate the validity of the lists generated in successfully replicating well-known semantic effects on false memory production. FMG potentially has broad applications by allowing for testing false memory production in domains that go well beyond the current possibilities, as it can be in principle applied to any vector space encoding properties related to word referents (e.g., lexical, orthographic, phonological, sensory, affective, etc.) or other type of stimuli (e.g., images, sounds, etc.).
Collapse
Affiliation(s)
- Marco A Petilli
- Department of Psychology, University of Milano-Bicocca, Milan, Italy.
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
- NeuroMI, Milan Center for Neuroscience, Milan, Italy
| | - Giuliana Mazzoni
- Department of Health, Dynamic and Clinical Psychology, University of Sapienza, Rome, Italy
- Department of Psychology, University of Hull, Hull, UK
| | - Michela Marchetti
- Department of Health, Dynamic and Clinical Psychology, University of Sapienza, Rome, Italy
| | - Luca Rinaldi
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
- Cognitive Psychology Unit, IRCCS Mondino Foundation, Pavia, Italy
| | - Daniele Gatti
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| |
Collapse
|
2
|
Gatti D, Raveling L, Petrenco A, Günther F. Valence without meaning: Investigating form and semantic components in pseudowords valence. Psychon Bull Rev 2024:10.3758/s13423-024-02487-3. [PMID: 38565840 DOI: 10.3758/s13423-024-02487-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2024] [Indexed: 04/04/2024]
Abstract
Valence is a dominant semantic dimension, and it is fundamentally linked to basic approach-avoidance behavior within a broad range of contexts. Previous studies have shown that it is possible to approximate the valence of existing words based on several surface-level and semantic components of the stimuli. Parallelly, recent studies have shown that even completely novel and (apparently) meaningless stimuli, like pseudowords, can be informative of meaning based on the information that they carry at the subword level. Here, we aimed to further extend this evidence by investigating whether humans can reliably assign valence to pseudowords and, additionally, to identify the factors explaining such valence judgments. In Experiment 1, we trained several models to predict valence judgments for existing words from their combined form and meaning information. Then, in Experiment 2 and Experiment 3, we extended the results by predicting participants' valence judgments for pseudowords, using a set of models indexing different (possible) sources of valence and selected the best performing model in a completely data-driven procedure. Results showed that the model including basic surface-level (i.e., letters composing the pseudoword) and orthographic neighbors information performed best, thus tracing back pseudoword valence to these components. These findings support perspectives on the nonarbitrariness of language and provide insights regarding how humans process the valence of novel stimuli.
Collapse
Affiliation(s)
- Daniele Gatti
- Department of Brain and Behavioral Sciences, University of Pavia, Piazza Botta 6, 27100, Pavia, Italy.
| | - Laura Raveling
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Aliona Petrenco
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Fritz Günther
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
3
|
Hagoort P, Özyürek A. Extending the Architecture of Language From a Multimodal Perspective. Top Cogn Sci 2024. [PMID: 38493475 DOI: 10.1111/tops.12728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]
Abstract
Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation.
Collapse
Affiliation(s)
- Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen
| | - Aslı Özyürek
- Max Planck Institute for Psycholinguistics, Nijmegen
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen
| |
Collapse
|
4
|
Fernandino L, Conant LL. The Primacy of Experience in Language Processing: Semantic Priming Is Driven Primarily by Experiential Similarity. bioRxiv 2023:2023.03.21.533703. [PMID: 36993310 PMCID: PMC10055357 DOI: 10.1101/2023.03.21.533703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
The organization of semantic memory, including memory for word meanings, has long been a central question in cognitive science. Although there is general agreement that word meaning representations must make contact with sensory-motor and affective experiences in a non-arbitrary fashion, the nature of this relationship remains controversial. One prominent view proposes that word meanings are represented directly in terms of their experiential content (i.e., sensory-motor and affective representations). Opponents of this view argue that the representation of word meanings reflects primarily taxonomic structure, that is, their relationships to natural categories. In addition, the recent success of language models based on word co-occurrence (i.e., distributional) information in emulating human linguistic behavior has led to proposals that this kind of information may play an important role in the representation of lexical concepts. We used a semantic priming paradigm designed for representational similarity analysis (RSA) to quantitatively assess how well each of these theories explains the representational similarity pattern for a large set of words. Crucially, we used partial correlation RSA to account for intercorrelations between model predictions, which allowed us to assess, for the first time, the unique effect of each model. Semantic priming was driven primarily by experiential similarity between prime and target, with no evidence of an independent effect of distributional or taxonomic similarity. Furthermore, only the experiential models accounted for unique variance in priming after partialling out explicit similarity ratings. These results support experiential accounts of semantic representation and indicate that, despite their good performance at some linguistic tasks, the distributional models evaluated here do not encode the same kind of information used by the human semantic system.
Collapse
Affiliation(s)
- Leonardo Fernandino
- Department of Neurology, Medical College of Wisconsin
- Department of Biomedical Engineering, Medical College of Wisconsin
| | | |
Collapse
|
5
|
Wang T, Xu X. The good, the bad, and the ambivalent: Extrapolating affective values for 38,000+ Chinese words via a computational model. Behav Res Methods 2023:10.3758/s13428-023-02274-3. [PMID: 37968560 DOI: 10.3758/s13428-023-02274-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2023] [Indexed: 11/17/2023]
Abstract
Word affective ratings are important tools in psycholinguistic research, natural language processing, and many other fields. However, even for well-studied languages, such norms are usually limited in scale. To extrapolate affective (i.e., valence and arousal) values for words in the SUBTLEX-CH database (Cai & Brysbaert, 2010, PLoS ONE, 5(6):e10729), we implemented a computational neural network which captured how words' vector-based semantic representations corresponded to the probability densities of their valence and arousal. Based on these probability density functions, we predicted not only a word's affective values, but also their respective degrees of variability that could characterize individual differences in human affective ratings. The resulting estimates of affective values largely converged with human ratings for both valence and arousal, and the estimated degrees of variability also captured important features of the variability in human ratings. We released the extrapolated affective values, together with their corresponding degrees of variability, for over 38,000 Chinese words in the Open Science Framework ( https://osf.io/s9zmd/ ). We also discussed how the view of embodied cognition could be illuminated by this computational model.
Collapse
Affiliation(s)
- Tianqi Wang
- School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China
- Speech Science Laboratory, The University of Hong Kong, Hong Kong, China
- Academic Unit of Human Communication, Development, and Information Sciences, The University of Hong Kong, Hong Kong, China
| | - Xu Xu
- School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China.
| |
Collapse
|
6
|
Wang T, Xu X, Xie X, Ng ML. Probing Lexical Ambiguity in Chinese Characters via Their Word Formations: Convergence of Perceived and Computed Metrics. Cogn Sci 2023; 47:e13379. [PMID: 37988245 DOI: 10.1111/cogs.13379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/23/2023]
Abstract
Lexical ambiguity is pervasive in language, and the nature of the representations of an ambiguous word's multiple meanings is yet to be fully understood. With a special focus on Chinese characters, the present study first established that native speaker's perception about a character's number of meanings was heavily influenced by the availability of its distinct word formations, while whether these meanings would be perceived to be closely related was driven by further conceptual analysis. These notions were operationalized as two computed metrics, which assessed the degree of dispersion across individual word formations and the degree of propinquity across clusters of word formations, respectively, in a distributional semantic space. The observed correlations between the computed and the perceived metrics indicated that the utility of word formations to tap into meaning representations of Chinese characters was indeed cognitively plausible. The results have demonstrated the extent to which distributional semantics could inform about meaning representations of Chinese characters, which has theoretical implications for the representation of ambiguous words more generally.
Collapse
Affiliation(s)
- Tianqi Wang
- School of Foreign Languages, Shanghai Jiao Tong University
- Speech Science Laboratory, The University of Hong Kong
| | - Xu Xu
- School of Foreign Languages, Shanghai Jiao Tong University
| | - Xurong Xie
- Beijing Key Lab of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences
| | | |
Collapse
|
7
|
Heitmeier M, Chuang YY, Baayen RH. How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cogn Psychol 2023; 146:101598. [PMID: 37716109 PMCID: PMC10589761 DOI: 10.1016/j.cogpsych.2023.101598] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 08/23/2023] [Accepted: 09/02/2023] [Indexed: 09/18/2023]
Abstract
Trial-to-trial effects have been found in a number of studies, indicating that processing a stimulus influences responses in subsequent trials. A special case are priming effects which have been modelled successfully with error-driven learning (Marsolek, 2008), implying that participants are continuously learning during experiments. This study investigates whether trial-to-trial learning can be detected in an unprimed lexical decision experiment. We used the Discriminative Lexicon Model (DLM; Baayen et al., 2019), a model of the mental lexicon with meaning representations from distributional semantics, which models error-driven incremental learning with the Widrow-Hoff rule. We used data from the British Lexicon Project (BLP; Keuleers et al., 2012) and simulated the lexical decision experiment with the DLM on a trial-by-trial basis for each subject individually. Then, reaction times were predicted with Generalized Additive Models (GAMs), using measures derived from the DLM simulations as predictors. We extracted measures from two simulations per subject (one with learning updates between trials and one without), and used them as input to two GAMs. Learning-based models showed better model fit than the non-learning ones for the majority of subjects. Our measures also provide insights into lexical processing and individual differences. This demonstrates the potential of the DLM to model behavioural data and leads to the conclusion that trial-to-trial learning can indeed be detected in unprimed lexical decision. Our results support the possibility that our lexical knowledge is subject to continuous changes.
Collapse
|
8
|
Wolfer S. Is More Always Better? Testing the Addition Bias for German Language Statistics. Cogn Sci 2023; 47:e13339. [PMID: 37705294 DOI: 10.1111/cogs.13339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/15/2023] [Accepted: 08/26/2023] [Indexed: 09/15/2023]
Abstract
This replication study aims to investigate a potential bias toward addition in the German language, building upon previous findings of Winter and colleagues who identified a similar bias in English. Our results confirm a bias in word frequencies and binomial expressions, aligning with these previous findings. However, the analysis of distributional semantics based on word vectors did not yield consistent results for German. Furthermore, our study emphasizes the crucial role of selecting appropriate translational equivalents, highlighting the significance of considering language-specific factors when testing for such biases for languages other than English.
Collapse
|
9
|
Bonandrini R, Amenta S, Sulpizio S, Tettamanti M, Mazzucchelli A, Marelli M. Form to meaning mapping and the impact of explicit morpheme combination in novel word processing. Cogn Psychol 2023; 145:101594. [PMID: 37598658 DOI: 10.1016/j.cogpsych.2023.101594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 06/25/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]
Abstract
In the present study, we leveraged computational methods to explore the extent to which, relative to direct access to semantics from orthographic cues, the additional appreciation of morphological cues is advantageous while inducing the meaning of affixed pseudo-words. We re-analyzed data from a study on a lexical decision task for affixed pseudo-words. We considered a parsimonious model only including semantic variables (namely, semantic neighborhood density, entropy, magnitude, stem proximity) derived through a word-form-to-meaning approach (ngram-based). We then explored the extent to which the addition of equivalent semantic variables derived by combining semantic information from morphemes (combination-based) improved the fit of the statistical model explaining human data. Results suggest that semantic information can be extracted from arbitrary clusters of letters, yet a computational model of semantic access also including a combination-based strategy based on explicit morphological information better captures the cognitive mechanisms underlying human performance. This is particularly evident when participants recognize affixed pseudo-words as meaningful stimuli.
Collapse
Affiliation(s)
| | - Simona Amenta
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| | - Simone Sulpizio
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| | - Marco Tettamanti
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| | | | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
10
|
Hörberg T, Larsson M, Olofsson JK. The Semantic Organization of the English Odor Vocabulary. Cogn Sci 2022; 46:e13205. [PMID: 36334010 DOI: 10.1111/cogs.13205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/11/2022]
Abstract
The vocabulary for describing odors in English natural language is not well understood, as prior studies of odor descriptions have often relied on preselected descriptors and odor ratings. Here, we present a data-driven approach that automatically identifies English odor descriptors based on their degree of olfactory association, and derive their semantic organization from their distributions in natural texts, using a distributional-semantic language model. We identify 243 descriptors that are much more strongly associated with olfaction than English words in general. We then derive the semantic organization of these olfactory descriptors, and find that it is captured by four clusters that we name Offensive, Malodorous, Fragrant, and Edible. The semantic space derived from our model primarily differentiates descriptors in terms of pleasantness and edibility along which our four clusters are positioned, and is similar to a space derived from perceptual data. The semantic organization of odor vocabulary can thus be mapped using natural language data (e.g., online text), without the limitations of odor-perceptual data and preselected descriptors. Our method may thus facilitate research on olfaction, a sensory system known to often elude verbal description.
Collapse
|
11
|
Jiang H, Frank MC, Kulkarni V, Fourtassi A. Exploring Patterns of Stability and Change in Caregivers' Word Usage Across Early Childhood. Cogn Sci 2022; 46:e13177. [PMID: 35820173 DOI: 10.1111/cogs.13177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Revised: 04/22/2022] [Accepted: 06/11/2022] [Indexed: 11/26/2022]
Abstract
The linguistic input children receive across early childhood plays a crucial role in shaping their knowledge about the world. To study this input, researchers have begun applying distributional semantic models to large corpora of child-directed speech, extracting various patterns of word use/co-occurrence. Previous work using these models has not measured how these patterns may change throughout development, however. In this work, we leverage natural language processing methods-originally developed to study historical language change-to compare caregivers' use of words when talking to younger versus older children. Some words' usage changed more than others; this variability could be predicted based on the word's properties at both the individual and category levels. These findings suggest that caregivers' changing patterns of word use may play a role in scaffolding children's acquisition of conceptual structure in early development.
Collapse
Affiliation(s)
- Hang Jiang
- Symbolic Systems Program, Stanford University
| | | | | | | |
Collapse
|
12
|
Günther F, Marelli M. Patterns in CAOSS: Distributed representations predict variation in relational interpretations for familiar and novel compound words. Cogn Psychol 2022; 134:101471. [PMID: 35339747 DOI: 10.1016/j.cogpsych.2022.101471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 12/01/2022]
Abstract
While distributional semantic models that represent word meanings as high-dimensional vectors induced from large text corpora have been shown to successfully predict human behavior across a wide range of tasks, they have also received criticism from different directions. These include concerns over their interpretability (how can numbers specifying abstract, latent dimensions represent meaning?) and their ability to capture variation in meaning (how can a single vector representation capture multiple different interpretations for the same expression?). Here, we demonstrate that semantic vectors can indeed rise up to these challenges, by training a mapping system (a simple linear regression) that predicts inter-individual variation in relational interpretations for compounds such as wood brush (for example brush FOR wood, or brush MADE OF wood) from (compositional) semantic vectors representing the meanings of these compounds. These predictions consistently beat different random baselines, both for familiar compounds (moon light, Experiment 1) as well as novel compounds (wood brush, Experiment 2), demonstrating that distributional semantic vectors encode variations in qualitative interpretations that can be decoded using techniques as simple as linear regression.
Collapse
Affiliation(s)
| | - Marco Marelli
- University of Milano-Bicocca, Milan, Italy; NeuroMI, Milan Center for Neuroscience, Milan, Italy
| |
Collapse
|
13
|
Johns BT. Accounting for item-level variance in recognition memory: Comparing word frequency and contextual diversity. Mem Cognit 2021. [PMID: 34811640 DOI: 10.3758/s13421-021-01249-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 11/08/2022]
Abstract
Contextual diversity modifies word frequency by ignoring the repetition of words in context (Adelman, Brown, & Quesada, 2006, Psychological Science, 17(9), 814-823). Semantic diversity modifies contextual diversity by taking into account the uniqueness of the contexts that a word occurs in when calculating lexical strength (Jones, Johns, & Recchia, 2012, Canadian Journal of Experimental Psychology, 66, 115-124). Recent research has demonstrated that measures based on contextual and semantic diversity provide a considerable improvement over word frequency when accounting for lexical organization data (Johns, 2021, Psychological Review, 128, 525-557; Johns, Dye, & Jones, 2020a, Quarterly Journal of Experimental Psychology, 73, 841-855). The article demonstrates that these same findings generalize to word-level episodic recognition rates, using the previously released data of Cortese, Khanna, and Hacker (Cortese et al., 2010, Memory, 18, 595-609) and Cortese, McCarty, and Schock (Cortese et al., 2015, Quarterly Journal of Experimental Psychology, 68, 1489-1501). It was found that including the best fitting contextual diversity model allowed for a very large increase in variance accounted for over previously used variables, such as word frequency, signalling commonality with results from the lexical organization literature. The findings of this article suggest that current trends in the collection of megadata sets of human behavior (e.g., Balota et al., 2007, Behavior Research Methods, 39(3), 445-459) provide a promising avenue to develop new theoretically oriented models of word-level episodic recognition data.
Collapse
|
14
|
Westera M, Gupta A, Boleda G, Padó S. Distributional Models of Category Concepts Based on Names of Category Members. Cogn Sci 2021; 45:e13029. [PMID: 34490924 DOI: 10.1111/cogs.13029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 05/31/2021] [Accepted: 07/08/2021] [Indexed: 11/29/2022]
Abstract
Cognitive scientists have long used distributional semantic representations of categories. The predominant approach uses distributional representations of category-denoting nouns, such as "city" for the category city. We propose a novel scheme that represents categories as prototypes over representations of names of its members, such as "Barcelona," "Mumbai," and "Wuhan" for the category city. This name-based representation empirically outperforms the noun-based representation on two experiments (modeling human judgments of category relatedness and predicting category membership) with particular improvements for ambiguous nouns. We discuss the model complexity of both classes of models and argue that the name-based model has superior explanatory potential with regard to concept acquisition.
Collapse
Affiliation(s)
| | - Abhijeet Gupta
- Institut für Sprache und Information, Heinrich-Heine-Universität Düsseldorf
| | - Gemma Boleda
- Department of Translation and Language Sciences, Universitat Pompeu Fabra.,ICREA
| | - Sebastian Padó
- Institut für Maschinelle Sprachverarbeitung, University of Stuttgart
| |
Collapse
|
15
|
De Deyne S, Navarro DJ, Collell G, Perfors A. Visual and Affective Multimodal Models of Word Meaning in Language and Mind. Cogn Sci 2021; 45:e12922. [PMID: 33432630 PMCID: PMC7816238 DOI: 10.1111/cogs.12922] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Revised: 10/26/2020] [Accepted: 11/10/2020] [Indexed: 01/16/2023]
Abstract
One of the main limitations of natural language‐based approaches to meaning is that they do not incorporate multimodal representations the way humans do. In this study, we evaluate how well different kinds of models account for people's representations of both concrete and abstract concepts. The models we compare include unimodal distributional linguistic models as well as multimodal models which combine linguistic with perceptual or affective information. There are two types of linguistic models: those based on text corpora and those derived from word association data. We present two new studies and a reanalysis of a series of previous studies. The studies demonstrate that both visual and affective multimodal models better capture behavior that reflects human representations than unimodal linguistic models. The size of the multimodal advantage depends on the nature of semantic representations involved, and it is especially pronounced for basic‐level concepts that belong to the same superordinate category. Additional visual and affective features improve the accuracy of linguistic models based on text corpora more than those based on word associations; this suggests systematic qualitative differences between what information is encoded in natural language versus what information is reflected in word associations. Altogether, our work presents new evidence that multimodal information is important for capturing both abstract and concrete words and that fully representing word meaning requires more than purely linguistic information. Implications for both embodied and distributional views of semantic representation are discussed.
Collapse
Affiliation(s)
- Simon De Deyne
- School of Psychological Sciences, University of Melbourne
| | | | | | - Andrew Perfors
- School of Psychological Sciences, University of Melbourne
| |
Collapse
|
16
|
Capuano F, Dudschig C, Günther F, Kaup B. Semantic Similarity of Alternatives Fostered by Conversational Negation. Cogn Sci 2021; 45:e13015. [PMID: 34288035 DOI: 10.1111/cogs.13015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 06/01/2021] [Accepted: 06/05/2021] [Indexed: 11/29/2022]
Abstract
Conversational negation often behaves differently from negation as a logical operator: when rejecting a state of affairs, it does not present all members of the complement set as equally plausible alternatives, but it rather suggests some of them as more plausible than others (e.g., "This is not a dog, it is a wolf/*screwdriver"). Entities that are semantically similar to a negated entity tend to be judged as better alternatives (Kruszewski et al., 2016). In fact, Kruszewski et al. (2016) show that the cosine similarity scores between the distributional semantics representations of a negated noun and its potential alternatives are highly correlated with the negated noun-alternatives human plausibility ratings. In a series of cloze tasks, we show that negation likewise restricts the production of plausible alternatives to similar entities. Furthermore, completions to negative sentences appear to be even more restricted than completions to an affirmative conjunctive context, hinting at a peculiarity of negation.
Collapse
Affiliation(s)
| | | | | | - Barbara Kaup
- Department of Psychology, University of Tübingen
| |
Collapse
|
17
|
Chang LM, Deák GO. Adjacent and Non-Adjacent Word Contexts Both Predict Age of Acquisition of English Words: A Distributional Corpus Analysis of Child-Directed Speech. Cogn Sci 2020; 44:e12899. [PMID: 33164262 DOI: 10.1111/cogs.12899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 07/27/2020] [Accepted: 08/04/2020] [Indexed: 12/01/2022]
Abstract
Children show a remarkable degree of consistency in learning some words earlier than others. What patterns of word usage predict variations among words in age of acquisition? We use distributional analysis of a naturalistic corpus of child-directed speech to create quantitative features representing natural variability in word contexts. We evaluate two sets of features: One set is generated from the distribution of words into frames defined by the two adjacent words. These features primarily encode syntactic aspects of word usage. The other set is generated from non-adjacent co-occurrences between words. These features encode complementary thematic aspects of word usage. Regression models using these distributional features to predict age of acquisition of 656 early-acquired English words indicate that both types of features improve predictions over simpler models based on frequency and appearance in salient or simple utterance contexts. Syntactic features were stronger predictors of children's production than comprehension, whereas thematic features were stronger predictors of comprehension. Overall, earlier acquisition was predicted by features representing frames that select for nouns and verbs, and by thematic content related to food and face-to-face play topics; later acquisition was predicted by features representing frames that select for pronouns and question words, and by content related to narratives and object play.
Collapse
|
18
|
Kelly MA, Arora N, West RL, Reitter D. Holographic Declarative Memory: Distributional Semantics as the Architecture of Memory. Cogn Sci 2020; 44:e12904. [PMID: 33140517 DOI: 10.1111/cogs.12904] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 03/30/2020] [Accepted: 08/31/2020] [Indexed: 11/29/2022]
Abstract
We demonstrate that the key components of cognitive architectures (declarative and procedural memory) and their key capabilities (learning, memory retrieval, probability judgment, and utility estimation) can be implemented as algebraic operations on vectors and tensors in a high-dimensional space using a distributional semantics model. High-dimensional vector spaces underlie the success of modern machine learning techniques based on deep learning. However, while neural networks have an impressive ability to process data to find patterns, they do not typically model high-level cognition, and it is often unclear how they work. Symbolic cognitive architectures can capture the complexities of high-level cognition and provide human-readable, explainable models, but scale poorly to naturalistic, non-symbolic, or big data. Vector-symbolic architectures, where symbols are represented as vectors, bridge the gap between the two approaches. We posit that cognitive architectures, if implemented in a vector-space model, represent a useful, explanatory model of the internal representations of otherwise opaque neural architectures. Our proposed model, Holographic Declarative Memory (HDM), is a vector-space model based on distributional semantics. HDM accounts for primacy and recency effects in free recall, the fan effect in recognition, probability judgments, and human performance on an iterated decision task. HDM provides a flexible, scalable alternative to symbolic cognitive architectures at a level of description that bridges symbolic, quantum, and neural models of cognition.
Collapse
Affiliation(s)
- Mary Alexandria Kelly
- Department of Computer Science, Bucknell University
- College of Information Sciences and Computing, The Pennsylvania State University
| | - Nipun Arora
- Department of Cognitive Science, Carleton University
| | - Robert L West
- Department of Cognitive Science, Carleton University
| | - David Reitter
- College of Information Sciences and Computing, The Pennsylvania State University
- Google Research
| |
Collapse
|
19
|
Abstract
This paper introduces a novel collection of word embeddings, numerical representations of lexical semantics, in 55 languages, trained on a large corpus of pseudo-conversational speech transcriptions from television shows and movies. The embeddings were trained on the OpenSubtitles corpus using the fastText implementation of the skipgram algorithm. Performance comparable with (and in some cases exceeding) embeddings trained on non-conversational (Wikipedia) text is reported on standard benchmark evaluation datasets. A novel evaluation method of particular relevance to psycholinguists is also introduced: prediction of experimental lexical norms in multiple languages. The models, as well as code for reproducing the models and all analyses reported in this paper (implemented as a user-friendly Python package), are freely available at: https://github.com/jvparidon/subs2vec.
Collapse
|
20
|
Lee SJ, Weinberg BD, Gore A, Banerjee I. A Scalable Natural Language Processing for Inferring BT-RADS Categorization from Unstructured Brain Magnetic Resonance Reports. J Digit Imaging 2020; 33:1393-400. [PMID: 32495125 DOI: 10.1007/s10278-020-00350-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
The aim of this study is to develop an automated classification method for Brain Tumor Reporting and Data System (BT-RADS) categories from unstructured and structured brain magnetic resonance imaging (MR) reports. This retrospective study included 1410 BT-RADS structured reports dated from January 2014 to December 2017 and a test set of 109 unstructured brain MR reports dated from January 2010 to December 2014. Text vector representations and semantic word embeddings were generated from individual report sections (i.e., "History," "Findings," etc.) using Tf-idf statistics and a fine-tuned word2vec model, respectively. Section-wise ensemble models were trained using gradient boosting (XGBoost), elastic net regularization, and random forests, and classification accuracy was evaluated on an independent test set of unstructured brain MR reports and a validation set of BT-RADS structured reports. Section-wise ensemble models using XGBoost and word2vec semantic word embeddings were more accurate than those using Tf-idf statistics when classifying unstructured reports, with an f1 score of 0.72. In contrast, models using traditional Tf-idf statistics outperformed the word2vec semantic approach for categorization from structured reports, with an f1 score of 0.98. Proposed natural language processing pipeline is capable of inferring BT-RADS report scores from unstructured reports after training on structured report data. Our study provides a detailed experimentation process and may provide guidance for the development of RADS-focused information extraction (IE) applications from structured and unstructured radiology reports.
Collapse
|
21
|
Günther F, Marelli M, Bölte J. Semantic transparency effects in German compounds: A large dataset and multiple-task investigation. Behav Res Methods 2020; 52:1208-24. [PMID: 32052353 DOI: 10.3758/s13428-019-01311-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents' meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be captured using a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2061 novel German compounds.
Collapse
|
22
|
Johns BT, Mewhort DJK, Jones MN. The Role of Negative Information in Distributional Semantic Learning. Cogn Sci 2019; 43:e12730. [PMID: 31087587 DOI: 10.1111/cogs.12730] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 01/18/2019] [Accepted: 03/25/2019] [Indexed: 11/29/2022]
Abstract
Distributional models of semantics learn word meanings from contextual co-occurrence patterns across a large sample of natural language. Early models, such as LSA and HAL (Landauer & Dumais, 1997; Lund & Burgess, 1996), counted co-occurrence events; later models, such as BEAGLE (Jones & Mewhort, 2007), replaced counting co-occurrences with vector accumulation. All of these models learned from positive information only: Words that occur together within a context become related to each other. A recent class of distributional models, referred to as neural embedding models, are based on a prediction process embedded in the functioning of a neural network: Such models predict words that should surround a target word in a given context (e.g., word2vec; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). An error signal derived from the prediction is used to update each word's representation via backpropagation. However, another key difference in predictive models is their use of negative information in addition to positive information to develop a semantic representation. The models use negative examples to predict words that should not surround a word in a given context. As before, an error signal derived from the prediction prompts an update of the word's representation, a procedure referred to as negative sampling. Standard uses of word2vec recommend a greater or equal ratio of negative to positive sampling. The use of negative information in developing a representation of semantic information is often thought to be intimately associated with word2vec's prediction process. We assess the role of negative information in developing a semantic representation and show that its power does not reflect the use of a prediction mechanism. Finally, we show how negative information can be efficiently integrated into classic count-based semantic models using parameter-free analytical transformations.
Collapse
Affiliation(s)
- Brendan T Johns
- Department of Communicative Disorders and Sciences, University at Buffalo
| | | | - Michael N Jones
- Department of Psychological and Brain Sciences, Indiana University
| |
Collapse
|
23
|
Meng X, Ganoe CH, Sieberg RT, Cheung YY, Hassanpour S. Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication. J Biomed Inform 2019; 93:103169. [PMID: 30959206 DOI: 10.1016/j.jbi.2019.103169] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 03/15/2019] [Accepted: 04/04/2019] [Indexed: 10/27/2022]
Abstract
Radiologists are expected to expediently communicate critical and unexpected findings to referring clinicians to prevent delayed diagnosis and treatment of patients. However, competing demands such as heavy workload along with lack of administrative support resulted in communication failures that accounted for 7% of the malpractice payments made from 2004 to 2008 in the United States. To address this problem, we have developed a novel machine learning method that can automatically and accurately identify cases that require prompt communication to referring physicians based on analyzing the associated radiology reports. This semi-supervised learning approach requires a minimal amount of manual annotations and was trained on a large multi-institutional radiology report repository from three major external healthcare organizations. To test our approach, we created a corpus of 480 radiology reports from our own institution and double-annotated cases that required prompt communication by two radiologists. Our evaluation on the test corpus achieved an F-score of 74.5% and recall of 90.0% in identifying cases for prompt communication. The implementation of the proposed approach as part of an online decision support system can assist radiologists in identifying radiological cases for prompt communication to referring physicians to avoid or minimize potential harm to patients.
Collapse
Affiliation(s)
- Xing Meng
- Computer Science Department, Dartmouth College, Hanover, NH 03755, USA
| | - Craig H Ganoe
- Biomedical Data Science Department, Dartmouth College, Hanover, NH 03755, USA
| | - Ryan T Sieberg
- Radiology Department, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | - Yvonne Y Cheung
- Radiology Department, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | - Saeed Hassanpour
- Computer Science Department, Dartmouth College, Hanover, NH 03755, USA; Biomedical Data Science Department, Dartmouth College, Hanover, NH 03755, USA; Epidemiology Department, Dartmouth College, Hanover, NH 03755, USA.
| |
Collapse
|
24
|
Ning W, Chan S, Beam A, Yu M, Geva A, Liao K, Mullen M, Mandl KD, Kohane I, Cai T, Yu S. Feature extraction for phenotyping from semantic and knowledge resources. J Biomed Inform 2019; 91:103122. [PMID: 30738949 PMCID: PMC6424621 DOI: 10.1016/j.jbi.2019.103122] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data. METHODS SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm. RESULTS SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors. CONCLUSION SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.
Collapse
Affiliation(s)
- Wenxin Ning
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Stephanie Chan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andrew Beam
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ming Yu
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Anesthesia, Harvard Medical School, Boston, MA, USA
| | - Katherine Liao
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Mary Mullen
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Isaac Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; Institute for Data Science, Tsinghua University, Beijing, China.
| |
Collapse
|
25
|
Banerjee I, Bozkurt S, Alkim E, Sagreiya H, Kurian AW, Rubin DL. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings. J Biomed Inform 2019; 92:103137. [PMID: 30807833 DOI: 10.1016/j.jbi.2019.103137] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 10/02/2018] [Accepted: 02/15/2019] [Indexed: 12/29/2022]
Abstract
We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category (22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-RADS categorization not only on a holdout internal testset and also on an external validation set (1900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.
Collapse
Affiliation(s)
- Imon Banerjee
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
| | - Selen Bozkurt
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA; Department of Biostatistics and Medical Informatics, Faculty of Medicine, Akdeniz University, Antalya 07059, Turkey
| | - Emel Alkim
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Hersh Sagreiya
- Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Allison W Kurian
- Medicine (Oncology) and Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA; Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
26
|
Abstract
We studied contestant accuracy and error in a popular television quiz show, "Jeopardy!" Using vector-based knowledge representations obtained from distributional models of semantic memory, we computed the strength of association between clues and responses in over 5,000 televised games. Such representations have been shown to play a key role in memory and judgment, and consistent with this work, we find that contestants are more likely to provide correct responses when these responses are strongly associated with their clues, and more likely to provide incorrect responses when correct responses are weakly or negatively associated with their clues. This effect is stronger for easier questions with low monetary values, and for questions in which contestants compete to respond quickly. Our results show how distributional models of semantic memory can be used to predict human behavior in naturalistic high-level judgment tasks with skilled participants and significant monetary and social incentives.
Collapse
|
27
|
Amith M, Cunningham R, Savas LS, Boom J, Schvaneveldt R, Tao C, Cohen T. Using Pathfinder networks to discover alignment between expert and consumer conceptual knowledge from online vaccine content. J Biomed Inform 2017; 74:33-45. [PMID: 28823922 DOI: 10.1016/j.jbi.2017.08.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/28/2017] [Accepted: 08/14/2017] [Indexed: 10/19/2022]
Abstract
This study demonstrates the use of distributed vector representations and Pathfinder Network Scaling (PFNETS) to represent online vaccine content created by health experts and by laypeople. By analyzing a target audience's conceptualization of a topic, domain experts can develop targeted interventions to improve the basic health knowledge of consumers. The underlying assumption is that the content created by different groups reflects the mental organization of their knowledge. Applying automated text analysis to this content may elucidate differences between the knowledge structures of laypeople (heath consumers) and professionals (health experts). This paper utilizes vaccine information generated by laypeople and health experts to investigate the utility of this approach. We used an established technique from cognitive psychology, Pathfinder Network Scaling to infer the structure of the associational networks between concepts learned from online content using methods of distributional semantics. In doing so, we extend the original application of PFNETS to infer knowledge structures from individual participants, to infer the prevailing knowledge structures within communities of content authors. The resulting graphs reveal opportunities for public health and vaccination education experts to improve communication and intervention efforts directed towards health consumers. Our efforts demonstrate the feasibility of using an automated procedure to examine the manifestation of conceptual models within large bodies of free text, revealing evidence of conflicting understanding of vaccine concepts among health consumers as compared with health experts. Additionally, this study provides insight into the differences between consumer and expert abstraction of domain knowledge, revealing vaccine-related knowledge gaps that suggest opportunities to improve provider-patient communication.
Collapse
Affiliation(s)
- Muhammad Amith
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1)
| | - Rachel Cunningham
- Texas Children's Hospital, 6621 Fannin St, Houston, TX, United States(3)
| | - Lara S Savas
- The University of Texas School of Public Health at Houston, 1200 Pressler Street Houston, TX 77030, United States(2)
| | - Julie Boom
- Texas Children's Hospital, 6621 Fannin St, Houston, TX, United States(3)
| | - Roger Schvaneveldt
- Arizona State University, Tempe, AZ, United States(4); New Mexico State University, Las Cruces, NM, United States(5)
| | - Cui Tao
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1)
| | - Trevor Cohen
- The University of Texas School of Biomedical Informatics at Houston. 7000 Fannin St, #600, Houston, TX, United States(1).
| |
Collapse
|
28
|
Marelli M, Gagné CL, Spalding TL. Compounding as Abstract Operation in Semantic Space: Investigating relational effects through a large-scale, data-driven computational model. Cognition 2017; 166:207-224. [PMID: 28582684 DOI: 10.1016/j.cognition.2017.05.026] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 05/10/2017] [Accepted: 05/17/2017] [Indexed: 11/25/2022]
Abstract
In many languages, compounding is a fundamental process for the generation of novel words. When this process is productive (as, e.g., in English), native speakers can juxtapose two words to create novel compounds that can be readily understood by other speakers. The present paper proposes a large-scale, data-driven computational system for compound semantic processing based on distributional semantics, the CAOSS model (Compounding as Abstract Operation in Semantic Space). In CAOSS, word meanings are represented as vectors encoding their lexical co-occurrences in a reference corpus. Given two constituent words, their composed representation (the compound) is computed by using matrices representing the abstract properties of constituent roles (modifier vs. head). The matrices are also induced through examples of language usage. The model is then validated against behavioral results concerning the processing of novel compounds, and in particular relational effects on response latencies. The effects of relational priming and relational dominance are considered. CAOSS predictions are shown to pattern with previous results, in terms of both the impact of relational information and the dissociations related to the different constituent roles. The simulations indicate that relational information is implicitly reflected in language usage, suggesting that human speakers can learn these aspects from language experience and automatically apply them to the processing of new word combinations. The present model is flexible enough to emulate this procedure, suggesting that relational effects might emerge as a by-product of nuanced operations across distributional patterns.
Collapse
Affiliation(s)
- Marco Marelli
- University of Milano-Bicocca, Department of Psychology, Piazza dell'Ateneo Nuovo 1, 20126 Milano, Italy.
| | - Christina L Gagné
- University of Alberta, Department of Psychology, P217 Biological Sciences Building, Edmonton, Alberta T6G 2E9, Canada.
| | - Thomas L Spalding
- University of Alberta, Department of Psychology, P217 Biological Sciences Building, Edmonton, Alberta T6G 2E9, Canada.
| |
Collapse
|
29
|
Lazaridou A, Marelli M, Baroni M. Multimodal Word Meaning Induction From Minimal Exposure to Natural Text. Cogn Sci 2017; 41 Suppl 4:677-705. [PMID: 28323353 DOI: 10.1111/cogs.12481] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 10/15/2016] [Accepted: 10/20/2016] [Indexed: 11/29/2022]
Abstract
By the time they reach early adulthood, English speakers are familiar with the meaning of thousands of words. In the last decades, computational simulations known as distributional semantic models (DSMs) have demonstrated that it is possible to induce word meaning representations solely from word co-occurrence statistics extracted from a large amount of text. However, while these models learn in batch mode from large corpora, human word learning proceeds incrementally after minimal exposure to new words. In this study, we run a set of experiments investigating whether minimal distributional evidence from very short passages suffices to trigger successful word learning in subjects, testing their linguistic and visual intuitions about the concepts associated with new words. After confirming that subjects are indeed very efficient distributional learners even from small amounts of evidence, we test a DSM on the same multimodal task, finding that it behaves in a remarkable human-like way. We conclude that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition.
Collapse
Affiliation(s)
| | - Marco Marelli
- Department of Experimental Psychology, Ghent University
| | - Marco Baroni
- Center for Mind/Brain Sciences, University of Trento
| |
Collapse
|
30
|
Abstract
This paper concerns the generation of distributed vector representations of biomedical concepts from structured knowledge, in the form of subject-relation-object triplets known as semantic predications. Specifically, we evaluate the extent to which a representational approach we have developed for this purpose previously, known as Predication-based Semantic Indexing (PSI), might benefit from insights gleaned from neural-probabilistic language models, which have enjoyed a surge in popularity in recent years as a means to generate distributed vector representations of terms from free text. To do so, we develop a novel neural-probabilistic approach to encoding predications, called Embedding of Semantic Predications (ESP), by adapting aspects of the Skipgram with Negative Sampling (SGNS) algorithm to this purpose. We compare ESP and PSI across a number of tasks including recovery of encoded information, estimation of semantic similarity and relatedness, and identification of potentially therapeutic and harmful relationships using both analogical retrieval and supervised learning. We find advantages for ESP in some, but not all of these tasks, revealing the contexts in which the additional computational work of neural-probabilistic modeling is justified.
Collapse
Affiliation(s)
- Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
| | | |
Collapse
|
31
|
Abstract
This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
Collapse
Affiliation(s)
- Emmanuel Keuleers
- a Department of Experimental Psychology , Ghent University , Gent , Belgium
| | | |
Collapse
|
32
|
Ahltorp M, Skeppstedt M, Kitajima S, Henriksson A, Rzepka R, Araki K. Expansion of medical vocabularies using distributional semantics on Japanese patient blogs. J Biomed Semantics 2016; 7:58. [PMID: 27671202 PMCID: PMC5037651 DOI: 10.1186/s13326-016-0093-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 08/15/2016] [Indexed: 01/11/2023] Open
Abstract
Background Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. Methods Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3 ×100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. Results Removing case particles and using a context window size of 1+1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8+8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. Conclusions Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.
Collapse
Affiliation(s)
| | - Maria Skeppstedt
- Department of Computer Science, Linnaeus University/Gavagai, Växjö/Stockholm, Sweden.
| | - Shiho Kitajima
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Aron Henriksson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Stockholm, Sweden
| | - Rafal Rzepka
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Kenji Araki
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| |
Collapse
|
33
|
Abstract
BACKGROUND Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling. METHODS Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size. RESULTS The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. CONCLUSIONS The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden.
| | - Jing Zhao
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| | - Henrik Boström
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| |
Collapse
|
34
|
Henriksson A, Kvist M, Dalianis H, Duneld M. Identifying adverse drug event information in clinical notes with distributional semantic representations of context. J Biomed Inform 2015; 57:333-49. [PMID: 26291578 DOI: 10.1016/j.jbi.2015.08.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 07/19/2015] [Accepted: 08/10/2015] [Indexed: 10/23/2022]
Abstract
For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the voluntary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics - i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words - and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| | - Maria Kvist
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Sweden.
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| | - Martin Duneld
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| |
Collapse
|
35
|
Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014; 52:293-310. [PMID: 25046831 DOI: 10.1016/j.jbi.2014.07.011] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 06/06/2014] [Accepted: 07/10/2014] [Indexed: 01/08/2023]
Abstract
Pharmacovigilance involves continually monitoring drug safety after drugs are put to market. To aid this process; algorithms for the identification of strongly correlated drug/adverse drug reaction (ADR) pairs from data sources such as adverse event reporting systems or Electronic Health Records have been developed. These methods are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. In this paper, we investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents. The advantage of LBD methods is that they can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. To do so, we draw upon vast repositories of knowledge that has been extracted from the biomedical literature by two Natural Language Processing tools, MetaMap and SemRep. We evaluate two LBD methods that scale comfortably to the volume of knowledge available in these repositories. Specifically, we evaluate Reflective Random Indexing (RRI), a model based on concept-level co-occurrence, and Predication-based Semantic Indexing (PSI), a model that encodes the nature of the relationship between concepts to support reasoning analogically about drug-effect relationships. An evaluation set was constructed from the Side Effect Resource 2 (SIDER2), which contains known drug/ADR relations, and models were evaluated for their ability to "rediscover" these relations. In this paper, we demonstrate that both RRI and PSI can recover known drug-adverse event associations. However, PSI performed better overall, and has the additional advantage of being able to recover the literature underlying the reasoning pathways it used to make its predictions.
Collapse
Affiliation(s)
- Ning Shang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| | | | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| |
Collapse
|
36
|
Zhang S, Elhadad N. Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform 2013; 46:1088-98. [PMID: 23954592 DOI: 10.1016/j.jbi.2013.08.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 07/25/2013] [Accepted: 08/07/2013] [Indexed: 10/26/2022]
Abstract
Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work.
Collapse
Affiliation(s)
- Shaodian Zhang
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, VC-5, New York, NY 10032, USA.
| | | |
Collapse
|