1
|
Demany L, Semal C, Pressnitzer D. Simple frequency ratios naturally make precisely perceived melodies. Curr Biol 2025; 35:1649-1655.e3. [PMID: 40081379 DOI: 10.1016/j.cub.2025.02.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 12/20/2024] [Accepted: 02/17/2025] [Indexed: 03/16/2025]
Abstract
Almost all human music is built on discrete scales of pitch.1 Culturally prominent scales, such as the diatonic major scale of Western music, make use of the simple frequency ratios 2:1, 3:2, and 4:3 between notes.2 It is generally believed that these ratios were chosen to optimize the consonance of simultaneous notes.3,4,5,6,7 Alternatively, or in addition, it is conceivable that these ratios are intrinsically advantageous for the perceptual encoding of melodies.8,9,10,11,12 Here, we provide behavioral support for this hypothesis. In three experiments, young Western adults had to detect pitch anomalies ("sour notes") in partly random pure-tone melodies based on various musical scales, including novel ones. The task did not require any musical knowledge. Most importantly, the listeners were extensively trained in order to saturate familiarity with the scales: for a given scale and listener, more than 2,000 (up to 5,280) trials were run. Practice largely improved performance. This occurred even for the diatonic major scale, suggesting that performance in our task was not biased by previous musical enculturation.13,14 Frequency ratio simplicity also favored performance. Crucially, its benefit was not smaller in the final test sessions, when performance for each scale was presumably optimal and no longer improvable by practice, than in the initial test sessions. Thus, frequency ratio simplicity appeared to be intrinsically advantageous, rather than advantageous merely due to familiarity. The naturalness of melodic intervals defined by simple frequency ratios is likely to have contributed to the cultural selection of musical scales.
Collapse
Affiliation(s)
- Laurent Demany
- Institut de Neurosciences Cognitives et Intégratives d'Aquitaine, CNRS, Université de Bordeaux, Bâtiment BBS, 2 rue Dr. Hoffmann Martinot, 33000 Bordeaux, France.
| | - Catherine Semal
- Institut de Neurosciences Cognitives et Intégratives d'Aquitaine, CNRS, Université de Bordeaux, Bâtiment BBS, 2 rue Dr. Hoffmann Martinot, 33000 Bordeaux, France; Ecole Nationale Supérieure de Cognitique, Bordeaux INP, 109 avenue Roul, 33400 Talence, France
| | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, CNRS, PSL University, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
3
|
Marjieh R, Sucholutsky I, van Rijn P, Jacoby N, Griffiths TL. Large language models predict human sensory judgments across six modalities. Sci Rep 2024; 14:21445. [PMID: 39271909 PMCID: PMC11399123 DOI: 10.1038/s41598-024-72071-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 09/03/2024] [Indexed: 09/15/2024] Open
Abstract
Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality, and provides highly correlated predictions with human data irrespective of whether direct visual input is provided or purely textual descriptors. To study the impact of specific languages, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
Collapse
Affiliation(s)
- Raja Marjieh
- Department of Psychology, Princeton University, Princeton, USA.
| | - Ilia Sucholutsky
- Department of Computer Science, Princeton University, Princeton, USA
| | - Pol van Rijn
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Nori Jacoby
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Cornell University, Ithaca, USA
| | - Thomas L Griffiths
- Department of Psychology, Princeton University, Princeton, USA
- Department of Computer Science, Princeton University, Princeton, USA
| |
Collapse
|
4
|
Jacoby N, Polak R, Grahn JA, Cameron DJ, Lee KM, Godoy R, Undurraga EA, Huanca T, Thalwitzer T, Doumbia N, Goldberg D, Margulis EH, Wong PCM, Jure L, Rocamora M, Fujii S, Savage PE, Ajimi J, Konno R, Oishi S, Jakubowski K, Holzapfel A, Mungan E, Kaya E, Rao P, Rohit MA, Alladi S, Tarr B, Anglada-Tort M, Harrison PMC, McPherson MJ, Dolan S, Durango A, McDermott JH. Commonality and variation in mental representations of music revealed by a cross-cultural comparison of rhythm priors in 15 countries. Nat Hum Behav 2024; 8:846-877. [PMID: 38438653 PMCID: PMC11132990 DOI: 10.1038/s41562-023-01800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 12/07/2023] [Indexed: 03/06/2024]
Abstract
Music is present in every known society but varies from place to place. What, if anything, is universal to music cognition? We measured a signature of mental representations of rhythm in 39 participant groups in 15 countries, spanning urban societies and Indigenous populations. Listeners reproduced random 'seed' rhythms; their reproductions were fed back as the stimulus (as in the game of 'telephone'), such that their biases (the prior) could be estimated from the distribution of reproductions. Every tested group showed a sparse prior with peaks at integer-ratio rhythms. However, the importance of different integer ratios varied across groups, often reflecting local musical practices. Our results suggest a common feature of music cognition: discrete rhythm 'categories' at small-integer ratios. These discrete representations plausibly stabilize musical systems in the face of cultural transmission but interact with culture-specific traditions to yield the diversity that is evident when mental representations are probed across many cultures.
Collapse
Affiliation(s)
- Nori Jacoby
- Computational Auditory Perception Group, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
- Presidential Scholars in Society and Neuroscience, Columbia University, New York, NY, USA.
| | - Rainer Polak
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Blindern, Oslo, Norway
| | - Jessica A Grahn
- Brain and Mind Institute and Department of Psychology, University of Western Ontario, London, Ontario, Canada
| | - Daniel J Cameron
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Ontario, Canada
| | - Kyung Myun Lee
- School of Digital Humanities and Social Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
- Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Ricardo Godoy
- Heller School for Social Policy and Management, Brandeis University, Waltham, MA, USA
| | - Eduardo A Undurraga
- Escuela de Gobierno, Pontificia Universidad Católica de Chile, Santiago, Chile
- CIFAR Azrieli Global Scholars programme, CIFAR, Toronto, Ontario, Canada
| | - Tomás Huanca
- Centro Boliviano de Investigación y Desarrollo Socio Integral, San Borja, Bolivia
| | | | - Noumouké Doumbia
- Sciences de l'Education, Université Catholique d'Afrique de l'Ouest, Bamako, Mali
| | - Daniel Goldberg
- Department of Music, University of Connecticut, Storrs, CT, USA
| | | | - Patrick C M Wong
- Department of Linguistics & Modern Languages and Brain and Mind Institute, Chinese University of Hong Kong, Hong Kong SAR, China
| | - Luis Jure
- School of Music, Universidad de la República, Montevideo, Uruguay
| | - Martín Rocamora
- Signal Processing Department, School of Engineering, Universidad de la República, Montevideo, Uruguay
- Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
| | - Shinya Fujii
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Patrick E Savage
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
- School of Psychology, University of Auckland, Auckland, New Zealand
| | - Jun Ajimi
- Department of Traditional Japanese Music, Tokyo University of the Arts, Tokyo, Japan
| | - Rei Konno
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Sho Oishi
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | | | - Andre Holzapfel
- Division of Media Technology and Interaction Design, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Esra Mungan
- Department of Psychology, Bogazici University, Istanbul, Turkey
| | - Ece Kaya
- Max Planck Research Group 'Neural and Environmental Rhythms', Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Cognitive Science Master Program, Bogazici University, Istanbul, Turkey
| | - Preeti Rao
- Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Mattur A Rohit
- Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
| | | | - Bronwyn Tarr
- Department of Cognitive and Evolutionary Anthropology, University of Oxford, Oxford, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Manuel Anglada-Tort
- Computational Auditory Perception Group, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Goldsmiths, University of London, London, UK
| | - Peter M C Harrison
- Computational Auditory Perception Group, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Faculty of Music, University of Cambridge, Cambridge, UK
| | - Malinda J McPherson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sophie Dolan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Wellesley College, Wellesley, MA, USA
| | - Alex Durango
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Neurosciences Graduate Program, Stanford University, Stanford, CA, USA
| | - Josh H McDermott
- Faculty of Music, University of Cambridge, Cambridge, UK.
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds & Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
5
|
Bruder C, Poeppel D, Larrouy-Maestri P. Perceptual (but not acoustic) features predict singing voice preferences. Sci Rep 2024; 14:8977. [PMID: 38637516 PMCID: PMC11026466 DOI: 10.1038/s41598-024-58924-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 04/03/2024] [Indexed: 04/20/2024] Open
Abstract
Why do we prefer some singers to others? We investigated how much singing voice preferences can be traced back to objective features of the stimuli. To do so, we asked participants to rate short excerpts of singing performances in terms of how much they liked them as well as in terms of 10 perceptual attributes (e.g.: pitch accuracy, tempo, breathiness). We modeled liking ratings based on these perceptual ratings, as well as based on acoustic features and low-level features derived from Music Information Retrieval (MIR). Mean liking ratings for each stimulus were highly correlated between Experiments 1 (online, US-based participants) and 2 (in the lab, German participants), suggesting a role for attributes of the stimuli in grounding average preferences. We show that acoustic and MIR features barely explain any variance in liking ratings; in contrast, perceptual features of the voices achieved around 43% of prediction. Inter-rater agreement in liking and perceptual ratings was low, indicating substantial (and unsurprising) individual differences in participants' preferences and perception of the stimuli. Our results indicate that singing voice preferences are not grounded in acoustic attributes of the voices per se, but in how these features are perceptually interpreted by listeners.
Collapse
Affiliation(s)
- Camila Bruder
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
| | - David Poeppel
- New York University, New York, NY, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
- Max Planck-NYU Center for Language, Music, and Emotion (CLaME), New York, USA
| | - Pauline Larrouy-Maestri
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Max Planck-NYU Center for Language, Music, and Emotion (CLaME), New York, USA
| |
Collapse
|
6
|
Shilton D, Passmore S, Savage PE. Group singing is globally dominant and associated with social context. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230562. [PMID: 37680502 PMCID: PMC10480695 DOI: 10.1098/rsos.230562] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 08/15/2023] [Indexed: 09/09/2023]
Abstract
Music is an interactive technology associated with religious and communal activities and was suggested to have evolved as a participatory activity supporting social bonding. In post-industrial societies, however, music's communal role was eclipsed by its relatively passive consumption by audiences disconnected from performers. It was suggested that as societies became larger and more differentiated, music became less participatory and more focused on solo singing. Here, we consider the prevalence of group singing and its relationship to social organization through the analysis of two global song corpora: 5776 coded audio recordings from 1024 societies, and 4709 coded ethnographic texts from 60 societies. In both corpora, we find that group singing is more common than solo singing, and that it is more likely in some social contexts (e.g. religious rituals, dance) than in others (e.g. healing, infant care). In contrast, relationships between group singing and social structure (community size or social differentiation) were not consistent within or between corpora. While we cannot exclude the possibility of sampling bias leading to systematic under-sampling of solo singing, our results from two large global corpora of different data types provide support for the interactive nature of music and its complex relationship with sociality.
Collapse
Affiliation(s)
- Dor Shilton
- Cohn Institute for the History and Philosophy of Science and Ideas, Tel Aviv University, Tel Aviv, Israel
| | - Sam Passmore
- Evolution of Cultural Diversity Initiative, Australian National University, Canberra, Australia
- Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Patrick E. Savage
- School of Psychology, University of Auckland, Auckland, New Zealand
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| |
Collapse
|