A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

Han Liang, Wanting Ji, Ruili Wang, Yaxiong Ma, Jincai Chen, Min Chen

Research output: Journal PublicationArticlepeer-review

9 Citations (Scopus)

Abstract

Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.

Original languageEnglish
Pages (from-to)17483-17489
Number of pages7
JournalIEEE Sensors Journal
Volume22
Issue number18
DOIs
Publication statusPublished - 15 Sept 2022
Externally publishedYes

Keywords

  • Sound event detection
  • acoustic scene classification
  • convolutional recurrent neural network
  • multi-task learning
  • temporal attention

ASJC Scopus subject areas

  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning'. Together they form a unique fingerprint.

Cite this