Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

36
(from Reference Citation Analysis)

Article PDFs (4)

Cited by > 0 (23)

Searched Name

Distributional semantics

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Petilli MA, Marelli M, Mazzoni G, Marchetti M, Rinaldi L, Gatti D. From vector spaces to DRM lists: False Memory Generator, a software for automated generation of lists of stimuli inducing false memories. Behav Res Methods 2024:10.3758/s13428-024-02425-0. [PMID: 38710986 DOI: 10.3758/s13428-024-02425-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2024] [Indexed: 05/08/2024]

Gatti D, Raveling L, Petrenco A, Günther F. Valence without meaning: Investigating form and semantic components in pseudowords valence. Psychon Bull Rev 2024:10.3758/s13423-024-02487-3. [PMID: 38565840 DOI: 10.3758/s13423-024-02487-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2024] [Indexed: 04/04/2024]

Hagoort P, Özyürek A. Extending the Architecture of Language From a Multimodal Perspective. Top Cogn Sci 2024. [PMID: 38493475 DOI: 10.1111/tops.12728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]

Fernandino L, Conant LL. The Primacy of Experience in Language Processing: Semantic Priming Is Driven Primarily by Experiential Similarity. bioRxiv 2023:2023.03.21.533703. [PMID: 36993310 PMCID: PMC10055357 DOI: 10.1101/2023.03.21.533703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]

Abstract

The organization of semantic memory, including memory for word meanings, has long been a central question in cognitive science. Although there is general agreement that word meaning representations must make contact with sensory-motor and affective experiences in a non-arbitrary fashion, the nature of this relationship remains controversial. One prominent view proposes that word meanings are represented directly in terms of their experiential content (i.e., sensory-motor and affective representations). Opponents of this view argue that the representation of word meanings reflects primarily taxonomic structure, that is, their relationships to natural categories. In addition, the recent success of language models based on word co-occurrence (i.e., distributional) information in emulating human linguistic behavior has led to proposals that this kind of information may play an important role in the representation of lexical concepts. We used a semantic priming paradigm designed for representational similarity analysis (RSA) to quantitatively assess how well each of these theories explains the representational similarity pattern for a large set of words. Crucially, we used partial correlation RSA to account for intercorrelations between model predictions, which allowed us to assess, for the first time, the unique effect of each model. Semantic priming was driven primarily by experiential similarity between prime and target, with no evidence of an independent effect of distributional or taxonomic similarity. Furthermore, only the experiential models accounted for unique variance in priming after partialling out explicit similarity ratings. These results support experiential accounts of semantic representation and indicate that, despite their good performance at some linguistic tasks, the distributional models evaluated here do not encode the same kind of information used by the human semantic system.

Collapse

Wang T, Xu X. The good, the bad, and the ambivalent: Extrapolating affective values for 38,000+ Chinese words via a computational model. Behav Res Methods 2023:10.3758/s13428-023-02274-3. [PMID: 37968560 DOI: 10.3758/s13428-023-02274-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2023] [Indexed: 11/17/2023]

Wang T, Xu X, Xie X, Ng ML. Probing Lexical Ambiguity in Chinese Characters via Their Word Formations: Convergence of Perceived and Computed Metrics. Cogn Sci 2023;47:e13379. [PMID: 37988245 DOI: 10.1111/cogs.13379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/23/2023]

Heitmeier M, Chuang YY, Baayen RH. How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cogn Psychol 2023;146:101598. [PMID: 37716109 PMCID: PMC10589761 DOI: 10.1016/j.cogpsych.2023.101598] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 08/23/2023] [Accepted: 09/02/2023] [Indexed: 09/18/2023]

Wolfer S. Is More Always Better? Testing the Addition Bias for German Language Statistics. Cogn Sci 2023;47:e13339. [PMID: 37705294 DOI: 10.1111/cogs.13339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/15/2023] [Accepted: 08/26/2023] [Indexed: 09/15/2023]

Bonandrini R, Amenta S, Sulpizio S, Tettamanti M, Mazzucchelli A, Marelli M. Form to meaning mapping and the impact of explicit morpheme combination in novel word processing. Cogn Psychol 2023;145:101594. [PMID: 37598658 DOI: 10.1016/j.cogpsych.2023.101594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 06/25/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]

Hörberg T, Larsson M, Olofsson JK. The Semantic Organization of the English Odor Vocabulary. Cogn Sci 2022;46:e13205. [PMID: 36334010 DOI: 10.1111/cogs.13205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/11/2022]

Jiang H, Frank MC, Kulkarni V, Fourtassi A. Exploring Patterns of Stability and Change in Caregivers' Word Usage Across Early Childhood. Cogn Sci 2022;46:e13177. [PMID: 35820173 DOI: 10.1111/cogs.13177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Revised: 04/22/2022] [Accepted: 06/11/2022] [Indexed: 11/26/2022]

Günther F, Marelli M. Patterns in CAOSS: Distributed representations predict variation in relational interpretations for familiar and novel compound words. Cogn Psychol 2022;134:101471. [PMID: 35339747 DOI: 10.1016/j.cogpsych.2022.101471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 12/01/2022]

Johns BT. Accounting for item-level variance in recognition memory: Comparing word frequency and contextual diversity. Mem Cognit 2021. [PMID: 34811640 DOI: 10.3758/s13421-021-01249-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 11/08/2022]

Westera M, Gupta A, Boleda G, Padó S. Distributional Models of Category Concepts Based on Names of Category Members. Cogn Sci 2021;45:e13029. [PMID: 34490924 DOI: 10.1111/cogs.13029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 05/31/2021] [Accepted: 07/08/2021] [Indexed: 11/29/2022]

De Deyne S, Navarro DJ, Collell G, Perfors A. Visual and Affective Multimodal Models of Word Meaning in Language and Mind. Cogn Sci 2021;45:e12922. [PMID: 33432630 PMCID: PMC7816238 DOI: 10.1111/cogs.12922] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Revised: 10/26/2020] [Accepted: 11/10/2020] [Indexed: 01/16/2023]

Capuano F, Dudschig C, Günther F, Kaup B. Semantic Similarity of Alternatives Fostered by Conversational Negation. Cogn Sci 2021;45:e13015. [PMID: 34288035 DOI: 10.1111/cogs.13015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 06/01/2021] [Accepted: 06/05/2021] [Indexed: 11/29/2022]

Chang LM, Deák GO. Adjacent and Non-Adjacent Word Contexts Both Predict Age of Acquisition of English Words: A Distributional Corpus Analysis of Child-Directed Speech. Cogn Sci 2020;44:e12899. [PMID: 33164262 DOI: 10.1111/cogs.12899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 07/27/2020] [Accepted: 08/04/2020] [Indexed: 12/01/2022]

Kelly MA, Arora N, West RL, Reitter D. Holographic Declarative Memory: Distributional Semantics as the Architecture of Memory. Cogn Sci 2020;44:e12904. [PMID: 33140517 DOI: 10.1111/cogs.12904] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 03/30/2020] [Accepted: 08/31/2020] [Indexed: 11/29/2022]

van Paridon J, Thompson B. subs2vec: Word embeddings from subtitles in 55 languages. Behav Res Methods 2021;53:629-55. [PMID: 32789660 DOI: 10.3758/s13428-020-01406-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Lee SJ, Weinberg BD, Gore A, Banerjee I. A Scalable Natural Language Processing for Inferring BT-RADS Categorization from Unstructured Brain Magnetic Resonance Reports. J Digit Imaging 2020;33:1393-400. [PMID: 32495125 DOI: 10.1007/s10278-020-00350-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open

Günther F, Marelli M, Bölte J. Semantic transparency effects in German compounds: A large dataset and multiple-task investigation. Behav Res Methods 2020;52:1208-24. [PMID: 32052353 DOI: 10.3758/s13428-019-01311-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Johns BT, Mewhort DJK, Jones MN. The Role of Negative Information in Distributional Semantic Learning. Cogn Sci 2019;43:e12730. [PMID: 31087587 DOI: 10.1111/cogs.12730] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 01/18/2019] [Accepted: 03/25/2019] [Indexed: 11/29/2022]

Abstract

Distributional models of semantics learn word meanings from contextual co-occurrence patterns across a large sample of natural language. Early models, such as LSA and HAL (Landauer & Dumais, 1997; Lund & Burgess, 1996), counted co-occurrence events; later models, such as BEAGLE (Jones & Mewhort, 2007), replaced counting co-occurrences with vector accumulation. All of these models learned from positive information only: Words that occur together within a context become related to each other. A recent class of distributional models, referred to as neural embedding models, are based on a prediction process embedded in the functioning of a neural network: Such models predict words that should surround a target word in a given context (e.g., word2vec; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). An error signal derived from the prediction is used to update each word's representation via backpropagation. However, another key difference in predictive models is their use of negative information in addition to positive information to develop a semantic representation. The models use negative examples to predict words that should not surround a word in a given context. As before, an error signal derived from the prediction prompts an update of the word's representation, a procedure referred to as negative sampling. Standard uses of word2vec recommend a greater or equal ratio of negative to positive sampling. The use of negative information in developing a representation of semantic information is often thought to be intimately associated with word2vec's prediction process. We assess the role of negative information in developing a semantic representation and show that its power does not reflect the use of a prediction mechanism. Finally, we show how negative information can be efficiently integrated into classic count-based semantic models using parameter-free analytical transformations.

Collapse

Meng X, Ganoe CH, Sieberg RT, Cheung YY, Hassanpour S. Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication. J Biomed Inform 2019;93:103169. [PMID: 30959206 DOI: 10.1016/j.jbi.2019.103169] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 03/15/2019] [Accepted: 04/04/2019] [Indexed: 10/27/2022]

Ning W, Chan S, Beam A, Yu M, Geva A, Liao K, Mullen M, Mandl KD, Kohane I, Cai T, Yu S. Feature extraction for phenotyping from semantic and knowledge resources. J Biomed Inform 2019;91:103122. [PMID: 30738949 PMCID: PMC6424621 DOI: 10.1016/j.jbi.2019.103122] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Abstract

OBJECTIVE

Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data.

METHODS

SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm.

RESULTS

SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors.

CONCLUSION

SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.

Collapse

Banerjee I, Bozkurt S, Alkim E, Sagreiya H, Kurian AW, Rubin DL. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings. J Biomed Inform 2019;92:103137. [PMID: 30807833 DOI: 10.1016/j.jbi.2019.103137] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 10/02/2018] [Accepted: 02/15/2019] [Indexed: 12/29/2022]

Bhatia S, Walasek L. Association and response accuracy in the wild. Mem Cognit 2019;47:292-8. [PMID: 30324558 DOI: 10.3758/s13421-018-0869-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Amith M, Cunningham R, Savas LS, Boom J, Schvaneveldt R, Tao C, Cohen T. Using Pathfinder networks to discover alignment between expert and consumer conceptual knowledge from online vaccine content. J Biomed Inform 2017;74:33-45. [PMID: 28823922 DOI: 10.1016/j.jbi.2017.08.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/28/2017] [Accepted: 08/14/2017] [Indexed: 10/19/2022]

Abstract

This study demonstrates the use of distributed vector representations and Pathfinder Network Scaling (PFNETS) to represent online vaccine content created by health experts and by laypeople. By analyzing a target audience's conceptualization of a topic, domain experts can develop targeted interventions to improve the basic health knowledge of consumers. The underlying assumption is that the content created by different groups reflects the mental organization of their knowledge. Applying automated text analysis to this content may elucidate differences between the knowledge structures of laypeople (heath consumers) and professionals (health experts). This paper utilizes vaccine information generated by laypeople and health experts to investigate the utility of this approach. We used an established technique from cognitive psychology, Pathfinder Network Scaling to infer the structure of the associational networks between concepts learned from online content using methods of distributional semantics. In doing so, we extend the original application of PFNETS to infer knowledge structures from individual participants, to infer the prevailing knowledge structures within communities of content authors. The resulting graphs reveal opportunities for public health and vaccination education experts to improve communication and intervention efforts directed towards health consumers. Our efforts demonstrate the feasibility of using an automated procedure to examine the manifestation of conceptual models within large bodies of free text, revealing evidence of conflicting understanding of vaccine concepts among health consumers as compared with health experts. Additionally, this study provides insight into the differences between consumer and expert abstraction of domain knowledge, revealing vaccine-related knowledge gaps that suggest opportunities to improve provider-patient communication.

Collapse

Marelli M, Gagné CL, Spalding TL. Compounding as Abstract Operation in Semantic Space: Investigating relational effects through a large-scale, data-driven computational model. Cognition 2017;166:207-224. [PMID: 28582684 DOI: 10.1016/j.cognition.2017.05.026] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 05/10/2017] [Accepted: 05/17/2017] [Indexed: 11/25/2022]

Lazaridou A, Marelli M, Baroni M. Multimodal Word Meaning Induction From Minimal Exposure to Natural Text. Cogn Sci 2017;41 Suppl 4:677-705. [PMID: 28323353 DOI: 10.1111/cogs.12481] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 10/15/2016] [Accepted: 10/20/2016] [Indexed: 11/29/2022]

Cohen T, Widdows D. Embedding of semantic predications. J Biomed Inform 2017;68:150-166. [PMID: 28284761 DOI: 10.1016/j.jbi.2017.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Revised: 02/27/2017] [Accepted: 03/05/2017] [Indexed: 11/20/2022]

Keuleers E, Balota DA. Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments. Q J Exp Psychol (Hove) 2016;68:1457-68. [PMID: 25975773 DOI: 10.1080/17470218.2015.1051065] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Ahltorp M, Skeppstedt M, Kitajima S, Henriksson A, Rzepka R, Araki K. Expansion of medical vocabularies using distributional semantics on Japanese patient blogs. J Biomed Semantics 2016;7:58. [PMID: 27671202 PMCID: PMC5037651 DOI: 10.1186/s13326-016-0093-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 08/15/2016] [Indexed: 01/11/2023] Open

Abstract

Background

Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs.

Methods

Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3 ×100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies.

Results

Removing case particles and using a context window size of 1+1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8+8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding.

Conclusions

Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

Collapse

Henriksson A, Zhao J, Dalianis H, Boström H. Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med Inform Decis Mak 2016;16 Suppl 2:69. [PMID: 27459846 PMCID: PMC4965720 DOI: 10.1186/s12911-016-0309-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

BACKGROUND

Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling.

METHODS

Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size.

RESULTS

The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases.

CONCLUSIONS

The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

Collapse

Henriksson A, Kvist M, Dalianis H, Duneld M. Identifying adverse drug event information in clinical notes with distributional semantic representations of context. J Biomed Inform 2015;57:333-49. [PMID: 26291578 DOI: 10.1016/j.jbi.2015.08.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 07/19/2015] [Accepted: 08/10/2015] [Indexed: 10/23/2022]

Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014;52:293-310. [PMID: 25046831 DOI: 10.1016/j.jbi.2014.07.011] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 06/06/2014] [Accepted: 07/10/2014] [Indexed: 01/08/2023]

Abstract

Pharmacovigilance involves continually monitoring drug safety after drugs are put to market. To aid this process; algorithms for the identification of strongly correlated drug/adverse drug reaction (ADR) pairs from data sources such as adverse event reporting systems or Electronic Health Records have been developed. These methods are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. In this paper, we investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents. The advantage of LBD methods is that they can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. To do so, we draw upon vast repositories of knowledge that has been extracted from the biomedical literature by two Natural Language Processing tools, MetaMap and SemRep. We evaluate two LBD methods that scale comfortably to the volume of knowledge available in these repositories. Specifically, we evaluate Reflective Random Indexing (RRI), a model based on concept-level co-occurrence, and Predication-based Semantic Indexing (PSI), a model that encodes the nature of the relationship between concepts to support reasoning analogically about drug-effect relationships. An evaluation set was constructed from the Side Effect Resource 2 (SIDER2), which contains known drug/ADR relations, and models were evaluated for their ability to "rediscover" these relations. In this paper, we demonstrate that both RRI and PSI can recover known drug-adverse event associations. However, PSI performed better overall, and has the additional advantage of being able to recover the literature underlying the reasoning pathways it used to make its predictions.

Collapse

Zhang S, Elhadad N. Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform 2013;46:1088-98. [PMID: 23954592 DOI: 10.1016/j.jbi.2013.08.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 07/25/2013] [Accepted: 08/07/2013] [Indexed: 10/26/2022]