1
|
Gordon MS, Ataucusi A. Continuous sliding frequency shifts produce an illusory tempo drift. JASA EXPRESS LETTERS 2021; 1:053202. [PMID: 36154107 DOI: 10.1121/10.0005001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
An acceleration or deceleration in the rate of music is described as a tempo drift. This study investigated tempo drift and how tempo drift might be influenced by a sliding frequency shift. Drifts of 10% or greater in tempo tended to be accurately detected; however, those judgments were somewhat affected by the type of music. Continuous sliding frequency shifts that paralleled the tempo drift improved the accuracy of tempo drift judgments. When frequency shifts were independent from a tempo drift, they produced an illusory experience of the tempo change. Findings are reported with their implications for music and time perception.
Collapse
Affiliation(s)
- Michael S Gordon
- Department of Psychology, William Paterson University, 300 Pompton Road, Wayne, New Jersey 07470, USA ,
| | - Alejandro Ataucusi
- Department of Psychology, William Paterson University, 300 Pompton Road, Wayne, New Jersey 07470, USA ,
| |
Collapse
|
2
|
Hsieh IH, Yeh WT. The Interaction Between Timescale and Pitch Contour at Pre-attentive Processing of Frequency-Modulated Sweeps. Front Psychol 2021; 12:637289. [PMID: 33833720 PMCID: PMC8021897 DOI: 10.3389/fpsyg.2021.637289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 02/17/2021] [Indexed: 11/30/2022] Open
Abstract
Speech comprehension across languages depends on encoding the pitch variations in frequency-modulated (FM) sweeps at different timescales and frequency ranges. While timescale and spectral contour of FM sweeps play important roles in differentiating acoustic speech units, relatively little work has been done to understand the interaction between the two acoustic dimensions at early cortical processing. An auditory oddball paradigm was employed to examine the interaction of timescale and pitch contour at pre-attentive processing of FM sweeps. Event-related potentials to frequency sweeps that vary in linguistically relevant pitch contour (fundamental frequency F0 vs. first formant frequency F1) and timescale (local vs. global) in Mandarin Chinese were recorded. Mismatch negativities (MMNs) were elicited by all types of sweep deviants. For local timescale, FM sweeps with F0 contours yielded larger MMN amplitudes than F1 contours. A reversed MMN amplitude pattern was obtained with respect to F0/F1 contours for global timescale stimuli. An interhemispheric asymmetry of MMN topography was observed corresponding to local and global-timescale contours. Falling but not rising frequency difference waveforms sweep contours elicited right hemispheric dominance. Results showed that timescale and pitch contour interacts with each other in pre-attentive auditory processing of FM sweeps. Findings suggest that FM sweeps, a type of non-speech signal, is processed at an early stage with reference to its linguistic function. That the dynamic interaction between timescale and spectral pattern is processed during early cortical processing of non-speech frequency sweep signal may be critical to facilitate speech encoding at a later stage.
Collapse
Affiliation(s)
- I-Hui Hsieh
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan
| | - Wan-Ting Yeh
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan
| |
Collapse
|
3
|
Tabas A, von Kriegstein K. Neural modelling of the encoding of fast frequency modulation. PLoS Comput Biol 2021; 17:e1008787. [PMID: 33657098 PMCID: PMC7959405 DOI: 10.1371/journal.pcbi.1008787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 03/15/2021] [Accepted: 02/12/2021] [Indexed: 11/19/2022] Open
Abstract
Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.
Collapse
Affiliation(s)
- Alejandro Tabas
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Saxony, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Saxony, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Saxony, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Saxony, Germany
| |
Collapse
|
4
|
Kung SJ, Wu DH, Hsu CH, Hsieh IH. A Minimum Temporal Window for Direction Detection of Frequency-Modulated Sweeps: A Magnetoencephalography Study. Front Psychol 2020; 11:389. [PMID: 32218758 PMCID: PMC7078663 DOI: 10.3389/fpsyg.2020.00389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 02/19/2020] [Indexed: 11/13/2022] Open
Abstract
The ability to rapidly encode the direction of frequency contour contained in frequency-modulated (FM) sweeps is essential for speech processing, music appreciation, and conspecific communications. Psychophysical evidence points to a common temporal window threshold for human listeners in processing rapid changes in frequency glides. No neural evidence has been provided for the existence of a cortical temporal window threshold underlying the encoding of rapid transitions in frequency glides. The present magnetoencephalography study used the cortical mismatch negativity activity (MMNm) to investigate the minimum temporal window required for detecting different magnitudes of directional changes in frequency-modulated sweeps. A deviant oddball paradigm was used in which directional upward or downward frequency sweep serves as the standard and the same type of sweep with the opposite direction serves as its deviant. Stimuli consisted of unidirectional linear frequency-sweep complexes that swept across speech-relevant frequency bands in durations of 10, 20, 40, 80, 160, and 320 ms (with corresponding rates of 50, 25, 12.5, 6.2, 3.1, 1.5 oct/s). The data revealed significant magnetic mismatch field responses across all sweep durations, with slower-rate sweeps eliciting larger MMNm responses. A greater temporally related enhancement in MMNm response was obtained for rising but not falling frequency sweep contours. A hemispheric asymmetry in the MMNm response pattern was observed corresponding to the directionality of frequency sweeps. Contrary to psychophysical findings, we report a temporal window as short as 10 ms sufficient to elicit a robust MMNm response to a directional change in speech-relevant frequency contours. The results suggest that auditory cortex requires extremely brief temporal window to implicitly differentiate a dynamic change in frequency of linguistically relevant pitch contours. That the brain is extremely sensitive to fine spectral changes contained in speech-relevant glides provides cortical evidence for the ecological importance of FM sweeps in speech processing.
Collapse
Affiliation(s)
- Shu-Jen Kung
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan
| | - Denise H Wu
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan
| | - Chun-Hsien Hsu
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan.,Institute of Linguistics, Academia Sinica, Taipei, Taiwan
| | - I-Hui Hsieh
- Institute of Cognitive Neuroscience, National Central University, Taoyuan City, Taiwan
| |
Collapse
|
5
|
Mehrabi A, Dixon S, Sandler MB. Vocal imitation of synthesised sounds varying in pitch, loudness and spectral centroid. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:783. [PMID: 28253682 DOI: 10.1121/1.4974825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 12/27/2016] [Accepted: 01/06/2017] [Indexed: 06/06/2023]
Abstract
Vocal imitations are often used to convey sonic ideas [Lemaitre, Dessein, Susini, and Aura. (2011). Ecol. Psych. 23(4), 267-307]. For computer based systems to interpret these vocalisations, it is advantageous to apply knowledge of what happens when people vocalise sounds where the acoustic features have different temporal envelopes. In the present study, 19 experienced musicians and music producers were asked to imitate 44 sounds with one or two feature envelopes applied. The study addresses two main questions: (1) How accurately can people imitate ramp and modulation envelopes for pitch, loudness, and spectral centroid?; (2) What happens to this accuracy when people are asked to imitate two feature envelopes simultaneously? The results show that experienced musicians can imitate pitch, loudness, and spectral centroid accurately, and that imitation accuracy is generally preserved when the imitated stimuli combine two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously.
Collapse
Affiliation(s)
- Adib Mehrabi
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Simon Dixon
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Mark B Sandler
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
6
|
d'Alessandro C, Feugère L, Le Beux S, Perrotin O, Rilliard A. Drawing melodies: evaluation of chironomic singing synthesis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:3601-3612. [PMID: 24907823 DOI: 10.1121/1.4875718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Cantor Digitalis, a real-time formant synthesizer controlled by a graphic tablet and a stylus, is used for assessment of melodic precision and accuracy in singing synthesis. Melodic accuracy and precision are measured in three experiments for groups of 20 and 28 subjects. The task of the subjects is to sing musical intervals and short melodies, at various tempi, using chironomy (hand-controlled singing), mute chironomy (without audio feedback), and their own voices. The results show the high accuracy and precision obtained by all the subjects for chironomic control of singing synthesis. Some subjects performed significantly better in chironomic singing compared to natural singing, although other subjects showed comparable proficiency. For the chironomic condition, mean note accuracy is less than 12 cents and mean interval accuracy is less than 25 cents for all the subjects. Comparing chironomy and mute chironomy shows that the skills used for writing and drawing are used for chironomic singing, but that the audio feedback helps in interval accuracy. Analysis of blind chironomy (without visual reference) indicates that a visual feedback helps greatly in both note and interval accuracy and precision. This study demonstrates the capabilities of chironomy as a precise and accurate mean for controlling singing synthesis.
Collapse
Affiliation(s)
- Christophe d'Alessandro
- Laboratoire de Mécanique et d'Informatique pour les Sciences de l'Ingénieur, LIMSI - CNRS, Université Paris Sud, 91405 Orsay, France
| | - Lionel Feugère
- Laboratoire de Mécanique et d'Informatique pour les Sciences de l'Ingénieur, LIMSI - CNRS, Université Paris Sud, 91405 Orsay, France
| | - Sylvain Le Beux
- Laboratoire de Mécanique et d'Informatique pour les Sciences de l'Ingénieur, LIMSI - CNRS, Université Paris Sud, 91405 Orsay, France
| | - Olivier Perrotin
- Laboratoire de Mécanique et d'Informatique pour les Sciences de l'Ingénieur, LIMSI - CNRS, Université Paris Sud, 91405 Orsay, France
| | - Albert Rilliard
- Laboratoire de Mécanique et d'Informatique pour les Sciences de l'Ingénieur, LIMSI - CNRS, Université Paris Sud, 91405 Orsay, France
| |
Collapse
|
7
|
Altmann CF, Gaese BH. Representation of frequency-modulated sounds in the human brain. Hear Res 2013; 307:74-85. [PMID: 23933098 DOI: 10.1016/j.heares.2013.07.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 07/26/2013] [Accepted: 07/27/2013] [Indexed: 10/26/2022]
Abstract
Frequency-modulation is a ubiquitous sound feature present in communicative sounds of various animal species and humans. Functional imaging of the human auditory system has seen remarkable advances in the last two decades and studies pertaining to frequency-modulation have centered around two major questions: a) are there dedicated feature-detectors encoding frequency-modulation in the brain and b) is there concurrent representation with amplitude-modulation, another temporal sound feature? In this review, we first describe how these two questions are motivated by psychophysical studies and neurophysiology in animal models. We then review how human non-invasive neuroimaging studies have furthered our understanding of the representation of frequency-modulated sounds in the brain. Finally, we conclude with some suggestions on how human neuroimaging could be used in future studies to address currently still open questions on this fundamental sound feature. This article is part of a Special Issue entitled Human Auditory Neuroimaging.
Collapse
Affiliation(s)
- Christian F Altmann
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan; Career-Path Promotion Unit for Young Life Scientists, Kyoto University, Kyoto 606-8501, Japan.
| | | |
Collapse
|
8
|
Hsieh IH, Fillmore P, Rong F, Hickok G, Saberi K. FM-selective networks in human auditory cortex revealed using fMRI and multivariate pattern classification. J Cogn Neurosci 2012; 24:1896-907. [PMID: 22640390 DOI: 10.1162/jocn_a_00254] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Frequency modulation (FM) is an acoustic feature of nearly all complex sounds. Directional FM sweeps are especially pervasive in speech, music, animal vocalizations, and other natural sounds. Although the existence of FM-selective cells in the auditory cortex of animals has been documented, evidence in humans remains equivocal. Here we used multivariate pattern analysis to identify cortical selectivity for direction of a multitone FM sweep. This method distinguishes one pattern of neural activity from another within the same ROI, even when overall level of activity is similar, allowing for direct identification of FM-specialized networks. Standard contrast analysis showed that despite robust activity in auditory cortex, no clusters of activity were associated with up versus down sweeps. Multivariate pattern analysis classification, however, identified two brain regions as selective for FM direction, the right primary auditory cortex on the supratemporal plane and the left anterior region of the superior temporal gyrus. These findings are the first to directly demonstrate existence of FM direction selectivity in the human auditory cortex.
Collapse
Affiliation(s)
- I-Hui Hsieh
- National Central University, Jhongli City, Taiwan.
| | | | | | | | | |
Collapse
|
9
|
d'Alessandro C, Rilliard A, Le Beux S. Chironomic stylization of intonation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1594-1604. [PMID: 21428522 DOI: 10.1121/1.3531802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Intonation stylization is studied using "chironomy," i.e., the analogy between hand gestures and prosodic movements. An intonation mimicking paradigm is used. The task of the ten subjects is to copy the intonation pattern of sentences with the help of a stylus on a graphic tablet, using a system for real-time manual intonation modification. Gestural imitation is compared to vocal imitation of the same sentences (seven for a male speaker, seven for a female speaker). Distance measures between gestural copies, vocal imitations, and original sentences are computed for performance assessment. Perceptual testing is also used for assessing the quality of gestural copies. The perceptual difference between natural and stylized contours is measured using a mean opinion score paradigm for 15 subjects. The results indicate that intonation contours can be stylized with accuracy by chironomic imitation. The results of vocal imitation and chironomic imitation are comparable, but subjects show better imitation results in vocal imitation. The best stylized contours using chironomy seems perceptually indistinguishable or almost indistinguishable from natural contours, particularly for female speech. This indicates that chironomic stylization is effective, and that hand movements can be analogous to intonation movements.
Collapse
Affiliation(s)
- Christophe d'Alessandro
- LIMSI-CNRS (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - Centre National de la Recherche Scientifique), BP 133, Orsay 91403, France.
| | | | | |
Collapse
|
10
|
Hsieh IH, Saberi K. Detection of sinusoidal amplitude modulation in logarithmic frequency sweeps across wide regions of the spectrum. Hear Res 2010; 262:9-18. [PMID: 20144700 DOI: 10.1016/j.heares.2010.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Revised: 01/19/2010] [Accepted: 02/02/2010] [Indexed: 11/27/2022]
Abstract
Many natural sounds such as speech contain concurrent amplitude and frequency modulation (AM and FM), with the FM components often in the form of directional frequency sweeps or glides. Most studies of modulation coding, however, have employed one modulation type in stationary carriers, and in cases where mixed-modulation sounds have been used, the FM component has typically been confined to an extremely narrow range within a critical band. The current study examined the ability to detect AM signals carried by broad logarithmic frequency sweeps using a 2-alternative forced-choice adaptive psychophysical design. AM-detection thresholds were measured as a function of signal modulation rate and carrier sweep frequency region. Thresholds for detection of AM in a sweep carrier ranged from -8 dB for an AM rate of 8 Hz to -30 dB at 128 Hz. Compared to thresholds obtained for stationary carriers (pure tones and filtered Gaussian noise), detection of AM carried by frequency sweeps substantially declined at low (12 dB at 8 Hz) but not high modulation rates. Several trends in the data, including sweep- versus stationary-carrier threshold patterns and effects of frequency region were predicted from a modulation filterbank model with an envelope-correlation decision statistic.
Collapse
Affiliation(s)
- I-Hui Hsieh
- Institute of Cognitive Neuroscience, National Central University, Jhongli City, Taiwan
| | | |
Collapse
|
11
|
Delviniotis D, Kouroupetroglou G, Theodoridis S. Acoustic analysis of musical intervals in modern Byzantine Chant scales. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:EL262-EL269. [PMID: 19062796 DOI: 10.1121/1.2968299] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The goal of this work is to investigate experimentally the music intervals in modern Byzantine Chant performance and to compare the obtained results with the equal temperament scales introduced by the Patriarchal Music Committee (PMC). Current measurements resulted from pressure and electroglottographic recordings of 13 famous chanters singing scales of all the music genera. The scales' microintervals were derived after pitch detection based on autocorrelation, cepstrum, and harmonic product spectrum analysis. The microintervallic differences between the experimental values and the PMC's ones were statistically analyzed indicating large deviation of the mean values and the standard deviations. Significant interaction effects were identified among some genera and between ascending and descending scale directions.
Collapse
|