Skip to the content.
Jim Tørresen

Jim Tørresen

Professor, Department for Informatics, University of Oslo

MishMash role: Member · WP1 , WP3 , WP7


Other projects

Latest results

Type

Book chapter

Journal article

  1. Journal article, 2026

    Investigating Auditory–Visual Perception Using Multi-Modal Neural Networks with the SoundActions Dataset

    Jinyue Guo ; Jim Tørresen ; Alexander Refsum Jensenius

    Musicologists, psychologists, and computer scientists study relationships between auditory and visual stimuli from very different perspectives and using various terminologies and methodologies. This article aims to bridge the gap between phenomenological sound theory, auditory–visual theory, and audio–video processing and machine learning. We introduce the SoundActions dataset, a collection of 365 audio–video recordings of (primarily) short sound actions. Each recording has been human‑labeled and annotated according to Pierre Schaeffer’s theory of reduced listening, which describes the property of the sound itself (e.g., ‘an impulsive sound’) instead of the source (e.g., ‘a bird sound’). With these reduced‑type labels in the audio–video dataset, we conducted two experiments: (1) fine‑tuning the latest audio–video transformer model on the reduced‑type labels in the SoundActions dataset, proving that the model can recognize reduced‑type labels, and observing that the modality‑imbalance phenomenon is similar to the added value theory by Michel Chion and (2) proposing the Ensemble of Perception Mode Adapters method inspired by Pierre Schaeffer’s three listening modes, improving the audio–video model also on reduced‑type tasks.

More results in NVA…