1
|
Holler J. Facial clues to conversational intentions. Trends Cogn Sci 2025:S1364-6613(25)00079-8. [PMID: 40345945 DOI: 10.1016/j.tics.2025.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 03/27/2025] [Accepted: 03/31/2025] [Indexed: 05/11/2025]
Abstract
It has long been known that we use words to perform speech acts foundational to everyday conversation, such as requesting, informing, proposing, or complaining. However, the natural environment of human language is face-to-face interaction where we use words and an abundance of visual signals to communicate. The multimodal nature of human language is increasingly recognised in the language and cognitive sciences. In line with this turn of the tide, findings demonstrate that facial signals significantly contribute to communicating intentions and that they may facilitate pragmatically appropriate responding in the fast-paced environment of conversation. In light of this, the notion of speech acts no longer seems appropriate, highlighting the need for a modality-neutral conception, such as social action.
Collapse
Affiliation(s)
- Judith Holler
- Donders Institute for Brain, Cognition & Behaviour, Radboud University Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
| |
Collapse
|
2
|
Ter Bekke M, Drijvers L, Holler J. Co-Speech Hand Gestures Are Used to Predict Upcoming Meaning. Psychol Sci 2025; 36:237-248. [PMID: 40261301 DOI: 10.1177/09567976251331041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025] Open
Abstract
In face-to-face conversation, people use speech and gesture to convey meaning. Seeing gestures alongside speech facilitates comprehenders' language processing, but crucially, the mechanisms underlying this facilitation remain unclear. We investigated whether comprehenders use the semantic information in gestures, typically preceding related speech, to predict upcoming meaning. Dutch adults listened to questions asked by a virtual avatar. Questions were accompanied by an iconic gesture (e.g., typing) or meaningless control movement (e.g., arm scratch) followed by a short pause and target word (e.g., "type"). A Cloze experiment showed that gestures improved explicit predictions of upcoming target words. Moreover, an EEG experiment showed that gestures reduced alpha and beta power during the pause, indicating anticipation, and reduced N400 amplitudes, demonstrating facilitated semantic processing. Thus, comprehenders use iconic gestures to predict upcoming meaning. Theories of linguistic prediction should incorporate communicative bodily signals as predictive cues to capture how language is processed in face-to-face interaction.
Collapse
Affiliation(s)
- Marlijn Ter Bekke
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Linda Drijvers
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Judith Holler
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| |
Collapse
|
3
|
Ünal E, Kırbaşoğlu K, Karadöller DZ, Sümer B, Özyürek A. Gesture Reduces Mapping Difficulties in the Development of Spatial Language Depending on the Complexity of Spatial Relations. Cogn Sci 2025; 49:e70046. [PMID: 39992958 PMCID: PMC11849910 DOI: 10.1111/cogs.70046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/05/2025] [Accepted: 02/07/2025] [Indexed: 02/26/2025]
Abstract
In spoken languages, children acquire locative terms in a cross-linguistically stable order. Terms similar in meaning to in and on emerge earlier than those similar to front and behind, followed by left and right. This order has been attributed to the complexity of the relations expressed by different locative terms. An additional possibility is that children may be delayed in expressing certain spatial meanings partly due to difficulties in discovering the mappings between locative terms in speech and spatial relation they express. We investigate cognitive and mapping difficulties in the domain of spatial language by comparing how children map spatial meanings onto speech versus visually motivated forms in co-speech gesture across different spatial relations. Twenty-four 8-year-old and 23 adult native Turkish-speakers described four-picture displays where the target picture depicted in-on, front-behind, or left-right relations between objects. As the complexity of spatial relations increased, children were more likely to rely on gestures as opposed to speech to informatively express the spatial relation. Adults overwhelmingly relied on speech to informatively express the spatial relation, and this did not change across the complexity of spatial relations. Nevertheless, even when spatial expressions in both speech and co-speech gesture were considered, children lagged behind adults when expressing the most complex left-right relations. These findings suggest that cognitive development and mapping difficulties introduced by the modality of expressions interact in shaping the development of spatial language.
Collapse
Affiliation(s)
- Ercenur Ünal
- Multimodal Language DepartmentMax Planck Institute for Psycholinguistics
- Department of PsychologyOzyegin University
| | | | - Dilay Z. Karadöller
- Multimodal Language DepartmentMax Planck Institute for Psycholinguistics
- Department of PsychologyMiddle East Technical University
| | - Beyza Sümer
- Multimodal Language DepartmentMax Planck Institute for Psycholinguistics
- Department of LinguisticsUniversity of Amsterdam
| | - Aslı Özyürek
- Multimodal Language DepartmentMax Planck Institute for Psycholinguistics
- Donders Institute for Brain, Cognition and BehaviourRadboud University
| |
Collapse
|
4
|
Emmendorfer AK, Holler J. Facial signals shape predictions about the nature of upcoming conversational responses. Sci Rep 2025; 15:1381. [PMID: 39779723 PMCID: PMC11711643 DOI: 10.1038/s41598-025-85192-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 01/01/2025] [Indexed: 01/11/2025] Open
Abstract
Increasing evidence suggests that interlocutors use visual communicative signals to form predictions about unfolding utterances, but there is little data on the predictive potential of facial signals in conversation. In an online experiment with virtual agents, we examine whether facial signals produced by an addressee may allow speakers to anticipate the response to a question before it is given. Participants (n = 80) viewed videos of short conversation fragments between two virtual humans. Each fragment ended with the Questioner asking a question, followed by a pause during which the Responder looked either straight at the Questioner (baseline), or averted their gaze, or accompanied the straight gaze with one of the following facial signals: brow raise, brow frown, nose wrinkle, smile, squint, mouth corner pulled back (dimpler). Participants then indicated on a 6-point scale whether they expected a "yes" or "no" response. Analyses revealed that all signals received different ratings relative to the baseline: brow raises, dimplers, and smiles were associated with more positive responses, gaze aversions, brow frowns, nose wrinkles, and squints with more negative responses. Qur findings show that interlocutors may form strong associations between facial signals and upcoming responses to questions, highlighting their predictive potential in face-to-face conversation.
Collapse
Affiliation(s)
- Alexandra K Emmendorfer
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Wundtlaan 1, 6525 XD Nijmegen, Nijmegen, The Netherlands.
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | - Judith Holler
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Wundtlaan 1, 6525 XD Nijmegen, Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| |
Collapse
|
5
|
Rühlemann C, Trujillo J. The effect of gesture expressivity on emotional resonance in storytelling interaction. Front Psychol 2024; 15:1477263. [PMID: 39802978 PMCID: PMC11721651 DOI: 10.3389/fpsyg.2024.1477263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 12/04/2024] [Indexed: 01/16/2025] Open
Abstract
The key function of storytelling is a meeting of hearts: a resonance in the recipient(s) of the story narrator's emotion toward the story events. This paper focuses on the role of gestures in engendering emotional resonance in conversational storytelling. The paper asks three questions: Does story narrators' gesture expressivity increase from story onset to climax offset (RQ #1)? Does gesture expressivity predict specific EDA responses in story participants (RQ #2)? How important is the contribution of gesture expressivity to emotional resonance compared to the contribution of other predictors of resonance (RQ #3)? 53 conversational stories were annotated for a large number of variables including Protagonist, Recency, Group composition, Group size, Sentiment, and co-occurrence with quotation. The gestures in the stories were coded for gesture phases and gesture kinematics including Size, Force, Character view-point, Silence during gesture, Presence of hold phase, Co-articulation with other bodily organs, and Nucleus duration. The Gesture Expressivity Index (GEI) provides an average of these parameters. Resonating gestures were identified, i.e., gestures exhibiting concurrent specific EDA responses by two or more participants. The first statistical model, which addresses RQ #1, suggested that story narrators' gestures become more expressive from story onset to climax offset. The model constructed to adress RQ #2 suggested that increased gesture expressivity increases the probability of specific EDA responses. To address RQ #3 a Random Forest for emotional resonance as outcome variable and the seven GEI parameters as well as six more variables as predictors was constructed. All predictors were found to impact Eemotional resonance. Analysis of variable importance showed Group composition to be the most impactful predictor. Inspection of ICE plots clearly indicated combined effects of individual GEI parameters and other factors, including Group size and Group composition. This study shows that more expressive gestures are more likely to elicit physiological resonance between individuals, suggesting an important role for gestures in connecting people during conversational storytelling. Methodologically, this study opens up new avenues of multimodal corpus linguistic research by examining the interplay of emotion-related measurements and gesture at micro-analytic kinematic levels and using advanced machine-learning methods to deal with the inherent collinearity of multimodal variables.
Collapse
Affiliation(s)
- Christoph Rühlemann
- Deutsches Seminar - Germanistische Linguistik, University of Freiburg, Freiburg, Germany
| | - James Trujillo
- Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
6
|
Leonetti S, Ravignani A, Pouw W. A cross-species framework for classifying sound-movement couplings. Neurosci Biobehav Rev 2024; 167:105911. [PMID: 39362418 DOI: 10.1016/j.neubiorev.2024.105911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 09/12/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Sound and movement are entangled in animal communication. This is obviously true in the case of sound-constituting vibratory movements of biological structures which generate acoustic waves. A little less obvious is that other moving structures produce the energy required to sustain these vibrations. In many species, the respiratory system moves to generate the expiratory flow which powers the sound-constituting movements (sound-powering movements). The sound may acquire additional structure via upper tract movements, such as articulatory movements or head raising (sound-filtering movements). Some movements are not necessary for sound production, but when produced, impinge on the sound-producing process due to weak biomechanical coupling with body parts (e.g., respiratory system) that are necessary for sound production (sound-impinging movements). Animals also produce sounds contingent with movement, requiring neuro-physiological control regimes allowing to flexibly couple movements to a produced sound, or coupling movements to a perceived external sound (sound-contingent movement). Here, we compare and classify the variety of ways sound and movements are coupled in animal communication; our proposed framework should help structure previous and future studies on this topic.
Collapse
Affiliation(s)
- Silvia Leonetti
- Department of Life Sciences and Systems Biology, University of Turin, Via Accademia Albertina 13, Turin 10123, Italy; Department of Human Neurosciences, Sapienza University of Rome, Piazzale Aldo Moro 5, Rome 00185, Italy; Comparative Bioacoustics Research Group, Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen 6525 XD, the Netherlands.
| | - Andrea Ravignani
- Department of Human Neurosciences, Sapienza University of Rome, Piazzale Aldo Moro 5, Rome 00185, Italy; Comparative Bioacoustics Research Group, Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen 6525 XD, the Netherlands; Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music Aarhus/Aalborg, Aarhus C 8000, Denmark
| | - Wim Pouw
- Donders Institute for Brain, Cognition, and Behavior, Radboud University Nijmegen, Houtlaan 4, Nijmegen 6525 XZ, the Netherlands.
| |
Collapse
|
7
|
Clough S, Brown-Schmidt S, Cho SJ, Duff MC. Reduced on-line speech gesture integration during multimodal language processing in adults with moderate-severe traumatic brain injury: Evidence from eye-tracking. Cortex 2024; 181:26-46. [PMID: 39488986 DOI: 10.1016/j.cortex.2024.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 05/30/2024] [Accepted: 08/21/2024] [Indexed: 11/05/2024]
Abstract
BACKGROUND Language is multimodal and situated in rich visual contexts. Language is also incremental, unfolding moment-to-moment in real time, yet few studies have examined how spoken language interacts with gesture and visual context during multimodal language processing. Gesture is a rich communication cue that is integrally related to speech and often depicts concrete referents from the visual world. Using eye-tracking in an adapted visual world paradigm, we examined how participants with and without moderate-severe traumatic brain injury (TBI) use gesture to resolve temporary referential ambiguity. METHODS Participants viewed a screen with four objects and one video. The speaker in the video produced sentences (e.g., "The girl will eat the very good sandwich"), paired with either a meaningful gesture (e.g., sandwich-holding gesture) or a meaningless grooming movement (e.g., arm scratch) at the verb "will eat." We measured participants' gaze to the target object (e.g., sandwich), a semantic competitor (e.g., apple), and two unrelated distractors (e.g., piano, guitar) during the critical window between movement onset in the gesture modality and onset of the spoken referent in speech. RESULTS Both participants with and without TBI were more likely to fixate the target when the speaker produced a gesture compared to a grooming movement; however, relative to non-injured participants, the effect was significantly attenuated in the TBI group. DISCUSSION We demonstrated evidence of reduced speech-gesture integration in participants with TBI relative to non-injured peers. This study advances our understanding of the communicative abilities of adults with TBI and could lead to a more mechanistic account of the communication difficulties adults with TBI experience in rich communication contexts that require the processing and integration of multiple co-occurring cues. This work has the potential to increase the ecological validity of language assessment and provide insights into the cognitive and neural mechanisms that support multimodal language processing.
Collapse
Affiliation(s)
- Sharice Clough
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA; Multimodal Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | - Sarah Brown-Schmidt
- Department of Psychology and Human Development, Vanderbilt University, Nashville, Tennessee, USA
| | - Sun-Joo Cho
- Department of Psychology and Human Development, Vanderbilt University, Nashville, Tennessee, USA
| | - Melissa C Duff
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
8
|
Ter Bekke M, Levinson SC, van Otterdijk L, Kühn M, Holler J. Visual bodily signals and conversational context benefit the anticipation of turn ends. Cognition 2024; 248:105806. [PMID: 38749291 DOI: 10.1016/j.cognition.2024.105806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 03/04/2024] [Accepted: 04/24/2024] [Indexed: 05/26/2024]
Abstract
The typical pattern of alternating turns in conversation seems trivial at first sight. But a closer look quickly reveals the cognitive challenges involved, with much of it resulting from the fast-paced nature of conversation. One core ingredient to turn coordination is the anticipation of upcoming turn ends so as to be able to ready oneself for providing the next contribution. Across two experiments, we investigated two variables inherent to face-to-face conversation, the presence of visual bodily signals and preceding discourse context, in terms of their contribution to turn end anticipation. In a reaction time paradigm, participants anticipated conversational turn ends better when seeing the speaker and their visual bodily signals than when they did not, especially so for longer turns. Likewise, participants were better able to anticipate turn ends when they had access to the preceding discourse context than when they did not, and especially so for longer turns. Critically, the two variables did not interact, showing that visual bodily signals retain their influence even in the context of preceding discourse. In a pre-registered follow-up experiment, we manipulated the visibility of the speaker's head, eyes and upper body (i.e. torso + arms). Participants were better able to anticipate turn ends when the speaker's upper body was visible, suggesting a role for manual gestures in turn end anticipation. Together, these findings show that seeing the speaker during conversation may critically facilitate turn coordination in interaction.
Collapse
Affiliation(s)
- Marlijn Ter Bekke
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands; Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | | | - Lina van Otterdijk
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Michelle Kühn
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Judith Holler
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands; Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands.
| |
Collapse
|
9
|
Dockendorff M, Schmitz L, Vesper C, Knoblich G. Communicative modulations of early action components support the prediction of distal goals. PLoS One 2024; 19:e0306072. [PMID: 38935629 PMCID: PMC11210802 DOI: 10.1371/journal.pone.0306072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 06/11/2024] [Indexed: 06/29/2024] Open
Abstract
The successful unfolding of many social interactions relies on our capacity to predict other people's action goals, whether these are proximal (i.e., immediate) or distal (i.e., upcoming). The present set of studies asks whether observers can predict the distal goal of two-step action sequences when presented with communicative modulations of the first movement component of the sequence. We conducted three online experiments in which we presented participants with animations of a box moving to a first target location before moving onwards to a final, either near or far, target location. The second movement component and the target locations were occluded. After observing the first movement, participants were asked to select the most likely final target location, i.e., the distal goal of the sequence. Experiment 1 showed that participants relied on the velocity modulations of the first movement to infer the distal goal. The results of Experiment 2 indicated that such predictions of distal goals are possible even when the second movement in the sequence does not contain any velocity information, thus suggesting that the information present in the first movement plays the major role in the process of linking movements to their distal goals. However, Experiment 3 showed that under some circumstances the second movement can also contribute to how observers predict a distal goal. We discuss these results in terms of the underlying simulation processes that enable observers to predict a distal goal from the observation of proximal communicative modulations.
Collapse
Affiliation(s)
- Martin Dockendorff
- Department of Cognitive Science, Central European University, Vienna, Austria
| | - Laura Schmitz
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Cordula Vesper
- Department of Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark
| | - Günther Knoblich
- Department of Cognitive Science, Central European University, Vienna, Austria
| |
Collapse
|
10
|
Hagoort P, Özyürek A. Extending the Architecture of Language From a Multimodal Perspective. Top Cogn Sci 2024. [PMID: 38493475 DOI: 10.1111/tops.12728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]
Abstract
Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation.
Collapse
Affiliation(s)
- Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen
| | - Aslı Özyürek
- Max Planck Institute for Psycholinguistics, Nijmegen
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen
| |
Collapse
|
11
|
Trujillo JP, Holler J. Conversational facial signals combine into compositional meanings that change the interpretation of speaker intentions. Sci Rep 2024; 14:2286. [PMID: 38280963 PMCID: PMC10821935 DOI: 10.1038/s41598-024-52589-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/20/2024] [Indexed: 01/29/2024] Open
Abstract
Human language is extremely versatile, combining a limited set of signals in an unlimited number of ways. However, it is unknown whether conversational visual signals feed into the composite utterances with which speakers communicate their intentions. We assessed whether different combinations of visual signals lead to different intent interpretations of the same spoken utterance. Participants viewed a virtual avatar uttering spoken questions while producing single visual signals (i.e., head turn, head tilt, eyebrow raise) or combinations of these signals. After each video, participants classified the communicative intention behind the question. We found that composite utterances combining several visual signals conveyed different meaning compared to utterances accompanied by the single visual signals. However, responses to combinations of signals were more similar to the responses to related, rather than unrelated, individual signals, indicating a consistent influence of the individual visual signals on the whole. This study therefore provides first evidence for compositional, non-additive (i.e., Gestalt-like) perception of multimodal language.
Collapse
Affiliation(s)
- James P Trujillo
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands.
| | - Judith Holler
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| |
Collapse
|
12
|
Kauttonen J, Paekivi S, Kauramäki J, Tikka P. Unraveling dyadic psycho-physiology of social presence between strangers during an audio drama - a signal-analysis approach. Front Psychol 2023; 14:1153968. [PMID: 37928563 PMCID: PMC10622809 DOI: 10.3389/fpsyg.2023.1153968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 10/04/2023] [Indexed: 11/07/2023] Open
Abstract
A mere co-presence of an unfamiliar person may modulate an individual's attentive engagement with specific events or situations to a significant degree. To understand better how such social presence affects experiences, we recorded a set of parallel multimodal facial and psychophysiological data with subjects (N = 36) who listened to dramatic audio scenes alone or when facing an unfamiliar person. Both a selection of 6 s affective sound clips (IADS-2) followed by a 27 min soundtrack extracted from a Finnish episode film depicted familiar and often intense social situations familiar from the everyday world. Considering the systemic complexity of both the chosen naturalistic stimuli and expected variations in the experimental social situation, we applied a novel combination of signal analysis methods using inter-subject correlation (ISC) analysis, Representational Similarity Analysis (RSA) and Recurrence Quantification Analysis (RQA) followed by gradient boosting classification. We report our findings concerning three facial signals, gaze, eyebrow and smile that can be linked to socially motivated facial movements. We found that ISC values of pairs, whether calculated on true pairs or any two individuals who had a partner, were lower than the group with single individuals. Thus, audio stimuli induced more unique responses in those subjects who were listening to it in the presence of another person, while individual listeners tended to yield a more uniform response as it was driven by dramatized audio stimulus alone. Furthermore, our classifiers models trained using recurrence properties of gaze, eyebrows and smile signals demonstrated distinctive differences in the recurrence dynamics of signals from paired subjects and revealed the impact of individual differences on the latter. We showed that the presence of an unfamiliar co-listener that modifies social dynamics of dyadic listening tasks can be detected reliably from visible facial modalities. By applying our analysis framework to a broader range of psycho-physiological data, together with annotations of the content, and subjective reports of participants, we expected more detailed dyadic dependencies to be revealed. Our work contributes towards modeling and predicting human social behaviors to specific types of audio-visually mediated, virtual, and live social situations.
Collapse
Affiliation(s)
- Janne Kauttonen
- Competences, RDI and Digitalization, Haaga-Helia University of Applied Sciences, Helsinki, Finland
- School of Arts, Design and Architecture, Aalto University, Espoo, Finland
- Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Sander Paekivi
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Jaakko Kauramäki
- School of Arts, Design and Architecture, Aalto University, Espoo, Finland
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Cognitive Brain Research Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Pia Tikka
- School of Arts, Design and Architecture, Aalto University, Espoo, Finland
- Enactive Virtuality Lab, Baltic Film, Media and Arts School (BFM), Centre of Excellence in Media Innovation and Digital Culture (MEDIT), Tallinn University, Tallinn, Estonia
| |
Collapse
|
13
|
Benetti S, Ferrari A, Pavani F. Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 2023; 17:1108354. [PMID: 36816496 PMCID: PMC9932987 DOI: 10.3389/fnhum.2023.1108354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023] Open
Abstract
In face-to-face communication, humans are faced with multiple layers of discontinuous multimodal signals, such as head, face, hand gestures, speech and non-speech sounds, which need to be interpreted as coherent and unified communicative actions. This implies a fundamental computational challenge: optimally binding only signals belonging to the same communicative action while segregating signals that are not connected by the communicative content. How do we achieve such an extraordinary feat, reliably, and efficiently? To address this question, we need to further move the study of human communication beyond speech-centred perspectives and promote a multimodal approach combined with interdisciplinary cooperation. Accordingly, we seek to reconcile two explanatory frameworks recently proposed in psycholinguistics and sensory neuroscience into a neurocognitive model of multimodal face-to-face communication. First, we introduce a psycholinguistic framework that characterises face-to-face communication at three parallel processing levels: multiplex signals, multimodal gestalts and multilevel predictions. Second, we consider the recent proposal of a lateral neural visual pathway specifically dedicated to the dynamic aspects of social perception and reconceive it from a multimodal perspective ("lateral processing pathway"). Third, we reconcile the two frameworks into a neurocognitive model that proposes how multiplex signals, multimodal gestalts, and multilevel predictions may be implemented along the lateral processing pathway. Finally, we advocate a multimodal and multidisciplinary research approach, combining state-of-the-art imaging techniques, computational modelling and artificial intelligence for future empirical testing of our model.
Collapse
Affiliation(s)
- Stefania Benetti
- Centre for Mind/Brain Sciences, University of Trento, Trento, Italy,Interuniversity Research Centre “Cognition, Language, and Deafness”, CIRCLeS, Catania, Italy,*Correspondence: Stefania Benetti,
| | - Ambra Ferrari
- Max Planck Institute for Psycholinguistics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Francesco Pavani
- Centre for Mind/Brain Sciences, University of Trento, Trento, Italy,Interuniversity Research Centre “Cognition, Language, and Deafness”, CIRCLeS, Catania, Italy
| |
Collapse
|