1
|
Nag S, John S, Agrawal A. NSP-SCD: A corpus construction protocol for child-directed print in understudied languages. Behav Res Methods 2024; 56:2751-2764. [PMID: 38361097 PMCID: PMC11133114 DOI: 10.3758/s13428-024-02339-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/09/2024] [Indexed: 02/17/2024]
Abstract
Child-directed print corpora enable systematic psycholinguistic investigations, but this research infrastructure is not available in many understudied languages. Moreover, researchers of understudied languages are dependent on manual tagging because precise automatized parsers are not yet available. One plausible way forward is to limit the intensive work to a small-sized corpus. However, with little systematic enquiry about approaches to corpus construction, it is unclear how robust a small corpus can be made. The current study examines the potential of a non-sequential sampling protocol for small corpus development (NSP-SCD) through a cross-corpora and within-corpus analysis. A corpus comprising 17,584 words was developed by applying the protocol to a larger corpus of 150,595 words from children's books for 3-to-10-year-olds. While the larger corpus will by definition have more instances of unique words and unique orthographic units, still, the selectively sampled small corpus approximated the larger corpus for lexical and orthographic diversity and was equivalent for orthographic representation and word length. Psycholinguistic complexity increased by book level and varied by parts of speech. Finally, in a robustness check of lexical diversity, the non-sequentially sampled small corpus was more efficient compared to a same-sized corpus constructed by simply using all sentences from a few books (402 books vs. seven books). If a small corpus must be used then non-sequential sampling from books stratified by book level makes the corpus statistics better approximate what is found in larger corpora. Overall, the protocol shows promise as a tool to advance the science of child language acquisition in understudied languages.
Collapse
Affiliation(s)
- Sonali Nag
- Department of Education, University of Oxford, Oxford, UK.
| | - Sunila John
- Department of Speech and Hearing, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, India
| | - Aakash Agrawal
- NeuroSpin, CEA, Gif-sur-Yvette, France
- The Promise Foundation, Bangalore, India
| |
Collapse
|
2
|
Morkovina O, Manukyan P, Sharapkova A. Picture naming test through the prism of cognitive neuroscience and linguistics: adapting the test for cerebellar tumor survivors-or pouring new wine in old sacks? Front Psychol 2024; 15:1332391. [PMID: 38566942 PMCID: PMC10985186 DOI: 10.3389/fpsyg.2024.1332391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/20/2024] [Indexed: 04/04/2024] Open
Abstract
A picture naming test (PNT) has long been regarded as an integral part of neuropsychological assessment. In current research and clinical practice, it serves a variety of purposes. PNTs are used to assess the severity of speech impairment in aphasia, monitor possible cognitive decline in aging patients with or without age-related neurodegenerative disorders, track language development in children and map eloquent brain areas to be spared during surgery. In research settings, picture naming tests provide an insight into the process of lexical retrieval in monolingual and bilingual speakers. However, while numerous advances have occurred in linguistics and neuroscience since the classic, most widespread PNTs were developed, few of them have found their way into test design. Consequently, despite the popularity of PNTs in clinical and research practice, their relevance and objectivity remain questionable. The present study provides an overview of literature where relevant criticisms and concerns have been expressed over the recent decades. It aims to determine whether there is a significant gap between conventional test design and the current understanding of the mechanisms underlying lexical retrieval by focusing on the parameters that have been experimentally proven to influence picture naming. We discuss here the implications of these findings for improving and facilitating test design within the picture naming paradigm. Subsequently, we highlight the importance of designing specialized tests with a particular target group in mind, so that test variables could be selected for cerebellar tumor survivors.
Collapse
Affiliation(s)
- Olga Morkovina
- Laboratory of Diagnostics and Advancing Cognitive Functions, Research Institute for Brain Development and Peak Performance, RUDN University, Moscow, Russia
- Department of English, Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russia
| | - Piruza Manukyan
- Laboratory of Diagnostics and Advancing Cognitive Functions, Research Institute for Brain Development and Peak Performance, RUDN University, Moscow, Russia
| | - Anastasia Sharapkova
- Laboratory of Diagnostics and Advancing Cognitive Functions, Research Institute for Brain Development and Peak Performance, RUDN University, Moscow, Russia
- Department of English Linguistics, Faculty of Philology, Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
3
|
Korochkina M, Marelli M, Brysbaert M, Rastle K. The Children and Young People's Books Lexicon (CYP-LEX): A large-scale lexical database of books read by children and young people in the United Kingdom. Q J Exp Psychol (Hove) 2024:17470218241229694. [PMID: 38262912 DOI: 10.1177/17470218241229694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
This article introduces the Children and Young People's Books-Lexicon (CYP-LEX), a large-scale lexical database derived from books popular with children and young people in the United Kingdom. CYP-LEX includes 1,200 books evenly distributed across three age bands (7-9, 10-12, 13+) and comprises over 70 million tokens and over 105,000 types. For each word in each age band, we provide its raw and Zipf-transformed frequencies, all parts-of-speech in which it occurs with raw frequency and lemma for each occurrence, and measures of count-based contextual diversity. Together and individually, the three CYP-LEX age bands contain substantially more words than any other publicly available database of books for primary and secondary school children. Most of these words are very low in frequency, and a substantial proportion of the words in each age band do not occur on British television. Although the three age bands share some very frequent words, they differ substantially regarding words that occur less frequently, and this pattern also holds at the level of individual books. Initial analyses of CYP-LEX illustrate why independent reading constitutes a challenge for children and young people, and they also underscore the importance of reading widely for the development of reading expertise. Overall, CYP-LEX provides unprecedented information into the nature of vocabulary in books that British children aged 7+ read, and is a highly valuable resource for those studying reading and language development.
Collapse
Affiliation(s)
- Maria Korochkina
- Department of Psychology, Royal Holloway, University of London, Egham, UK
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| | - Marc Brysbaert
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Kathleen Rastle
- Department of Psychology, Royal Holloway, University of London, Egham, UK
| |
Collapse
|
4
|
Li L, Zhao W, Song M, Wang J, Cai Q. CCLOOW: Chinese children's lexicon of oral words. Behav Res Methods 2024; 56:846-859. [PMID: 36881355 DOI: 10.3758/s13428-023-02077-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2023] [Indexed: 03/08/2023]
Abstract
In this article, we introduce the Chinese Children's Lexicon of Oral Words (CCLOOW), the first lexical database based on animated movies and TV series for 3-to-9-year-old Chinese children. The database computes from 2.7 million character tokens and 1.8 million word tokens. It contains 3920 unique character and 22,229 word types. CCLOOW reports frequency and contextual diversity metrics of the characters and words, as well as length and syntactic categories of the words. CCLOOW frequency and contextual diversity measures correlated well with other Chinese lexical databases, particularly well with that computed from children's books. The predictive validity of CCLOOW measures were confirmed with Grade 2 children's naming and lexical decision experiments. Further, we found that CCLOOW frequencies could explain a considerable proportion in adults' written word recognition, indicating that early language experience might have lasting impacts on the mature lexicon. CCLOOW provides validated frequency and contextual diversity estimates that complements current children's lexical database based on written language samples. It is freely accessible online at https://www.learn2read.cn/ccloow .
Collapse
Affiliation(s)
- Luan Li
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China
- Shanghai Changning Mental Health Center, Shanghai, China
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
| | - Wentao Zhao
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China
- Shanghai Changning Mental Health Center, Shanghai, China
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
| | - Ming Song
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China
- Shanghai Changning Mental Health Center, Shanghai, China
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
| | - Jing Wang
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China.
- Shanghai Changning Mental Health Center, Shanghai, China.
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China.
| | - Qing Cai
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China.
- Shanghai Changning Mental Health Center, Shanghai, China.
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China.
| |
Collapse
|
5
|
Armando M, Grainger J, Dufau S. Multi-LEX: A database of multi-word frequencies for French and English. Behav Res Methods 2023; 55:4315-4328. [PMID: 36443580 DOI: 10.3758/s13428-022-02018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2022] [Indexed: 11/30/2022]
Abstract
Written word frequency is a key variable used in many psycholinguistic studies and is central in explaining visual word recognition. Indeed, methodological advances on single-word frequency estimates have helped to uncover novel language-related cognitive processes, fostering new ideas and studies. In an attempt to support and promote research on a related emerging topic, visual multi-word recognition, we extracted from the exhaustive Google Ngram datasets a selection of millions of multi-word sequences and computed their associated frequency estimate. Such sequences are presented with part-of-speech information for each individual word. An online behavioral investigation making use of the French 4-gram lexicon in a grammatical decision task was carried out. The results show an item-level frequency effect of word sequences. Moreover, the proposed datasets were found useful during the stimulus selection phase, allowing more precise control of the multi-word characteristics.
Collapse
Affiliation(s)
- Marjorie Armando
- Laboratoire de Psychologie Cognitive (UMR7290), CNRS & Aix-Marseille Université Case D, 3, place Victor HUGO, 13331, Marseille Cedex 3, France
- Aix Marseille University, CNRS, LIS, Marseille, France
- Pôle pilote Ampiric, Institut National Supérieur du Professorat et de l'Éducation, Aix-Marseille Université, Marseille, France
| | - Jonathan Grainger
- Laboratoire de Psychologie Cognitive (UMR7290), CNRS & Aix-Marseille Université Case D, 3, place Victor HUGO, 13331, Marseille Cedex 3, France
| | - Stephane Dufau
- Laboratoire de Psychologie Cognitive (UMR7290), CNRS & Aix-Marseille Université Case D, 3, place Victor HUGO, 13331, Marseille Cedex 3, France.
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
6
|
Green C, Keogh K, Sun H, O'Brien B. The Children's Picture Books Lexicon (CPB-LEX): A large-scale lexical database from children's picture books. Behav Res Methods 2023:10.3758/s13428-023-02198-y. [PMID: 37566336 DOI: 10.3758/s13428-023-02198-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2023] [Indexed: 08/12/2023]
Abstract
This article presents CPB-LEX, a large-scale database of lexical statistics derived from children's picture books (age range 0-8 years). Such a database is essential for research in psychology, education and computational modelling, where rich details on the vocabulary of early print exposure are required. CPB-LEX was built through an innovative method of computationally extracting lexical information from automatic speech-to-text captions and subtitle tracks generated from social media channels dedicated to reading picture books aloud. It consists of approximately 25,585 types (wordforms) and their frequency norms (raw and Zipf-transformed), a lexicon of bigrams (two-word sequences and their transitional probabilities) and a document-term matrix (which shows the importance of each word in the corpus in each book). Several immediate contributions of CPB-LEX to behavioural science research are reported, including that the new CPB-LEX frequency norms strongly predict age of acquisition and outperform comparable child-input lexical databases. The database allows researchers and practitioners to extract lexical statistics for high-frequency words which can be used to develop word lists. The paper concludes with an investigation of how CPB-LEX can be used to extend recent modelling research on the lexical diversity children receive from picture books in addition to child-directed speech. Our model shows that the vocabulary input from a relatively small number of picture books can dramatically enrich vocabulary exposure from child-directed speech and potentially assist children with vocabulary input deficits. The database is freely available from the Open Science Framework repository: https://tinyurl.com/4este73c .
Collapse
Affiliation(s)
- Clarence Green
- Faculty of Education, University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Kathleen Keogh
- Senior Lecturer, Centre for Smart Analytics & Institute of Innovation, Science and Sustainability, Federation University Australia, Mount Helen, Australia.
| | - He Sun
- Centre for Research in Child Language, National Institute of Education, Nanyang Technological University, Singapore, Singapore
| | - Beth O'Brien
- Centre for Research in Child Development, National Institute of Education, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
7
|
CCLOWW: A grade-level Chinese children's lexicon of written words. Behav Res Methods 2022:10.3758/s13428-022-01890-9. [PMID: 35776384 DOI: 10.3758/s13428-022-01890-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2022] [Indexed: 11/08/2022]
Abstract
In this article, we present the Chinese Children's Lexicon of Written Words (CCLOWW), the first grade-level database that provides frequency statistics of simplified Chinese characters and words for children. The database computes from a corpus of 34,671,424 character tokens and 22,427,010 word tokens (including single- and multicharacter words), extracted from 2131 books. It contains 6746 different character types and 153,079 different word types. CCLOWW provides several frequency indices of simplified Chinese for three grade levels (grade 2 and below, grades 3-4, grades 5-6) to profile children's experience with written Chinese in and outside of school. We describe in this article the distributions of frequency and contextual diversity of the characters and words, as well as word length and syntactic categories of the words in the corpus and the subcorpora. We also report results of correlation analyses with other written corpora and of several naming and lexicon decision experiments. The findings suggest that CCLOWW frequency measures correlate well with other corpora. Importantly, they could reliably predict children's and adults' naming and lexical decision performances. They could also explain variance in adults' visual word recognition, in addition to frequency measures computed in an adult corpus, indicating that early print exposure might influence readers' lexical processing later on beyond an age of acquisition effect. CCLOWW will help researchers in language processing and development as well as educators with selecting language materials appropriate for children's developmental stages. The database is freely available online at https://www.learn2read.cn/database/ .
Collapse
|
8
|
Terzopoulos AR, Niolaki GZ, Masterson J. Intervention for a lexical reading and spelling difficulty in two Greek-speaking primary age children. Neuropsychol Rehabil 2018; 30:371-392. [PMID: 29756536 DOI: 10.1080/09602011.2018.1467330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
An intervention study was carried out with two nine-year-old Greek-speaking dyslexic children. Both children were slow in reading single words and text and had difficulty in spelling irregularly spelled words. One child was also poor in non-word reading. Intervention focused on spelling in a whole-word training using a flashcard technique that had previously been found to be effective with English-speaking children. Post-intervention assessments conducted immediately at the end of the intervention, one month later and then five months later showed a significant improvement in spelling of treated words that was sustained over time. In addition, both children showed generalisation of improvement to untrained words and an increase in scores in a standardised spelling assessment. The findings support the effectiveness of theoretically based targeted intervention for literacy difficulties.
Collapse
Affiliation(s)
- Aris R Terzopoulos
- Institute of Education, University College London, London, UK.,School of Social Sciences, University of Dundee, Dundee, UK
| | - Georgia Z Niolaki
- Institute of Education, University College London, London, UK.,School of Psychological, Social and Behavioural Sciences, Coventry University, Coventry, UK
| | | |
Collapse
|