A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

Han Liang; Wanting Ji; Ruili Wang; Yaxiong Ma; Jincai Chen; Min Chen

doi:10.1109/JSEN.2021.3098325

A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

Han Liang, Wanting Ji, Ruili Wang, Yaxiong Ma, Jincai Chen, Min Chen

Research output: Journal Publication › Article › peer-review

9 Citations (Scopus)

Abstract

Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.

Original language	English
Pages (from-to)	17483-17489
Number of pages	7
Journal	IEEE Sensors Journal
Volume	22
Issue number	18
DOIs	https://doi.org/10.1109/JSEN.2021.3098325
Publication status	Published - 15 Sept 2022
Externally published	Yes

Keywords

Sound event detection
acoustic scene classification
convolutional recurrent neural network
multi-task learning
temporal attention

ASJC Scopus subject areas

Instrumentation
Electrical and Electronic Engineering

Access to Document

10.1109/JSEN.2021.3098325

Cite this

@article{b262ffb13e42431f911824152bfda06c,

title = "A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning",

abstract = "Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.",

keywords = "Sound event detection, acoustic scene classification, convolutional recurrent neural network, multi-task learning, temporal attention",

author = "Han Liang and Wanting Ji and Ruili Wang and Yaxiong Ma and Jincai Chen and Min Chen",

note = "Publisher Copyright: {\textcopyright} 2001-2012 IEEE.",

year = "2022",

month = sep,

day = "15",

doi = "10.1109/JSEN.2021.3098325",

language = "English",

volume = "22",

pages = "17483--17489",

journal = "IEEE Sensors Journal",

issn = "1530-437X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "18",

}

TY - JOUR

T1 - A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

AU - Liang, Han

AU - Ji, Wanting

AU - Wang, Ruili

AU - Ma, Yaxiong

AU - Chen, Jincai

AU - Chen, Min

PY - 2022/9/15

Y1 - 2022/9/15

N2 - Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.

AB - Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.

KW - Sound event detection

KW - acoustic scene classification

KW - convolutional recurrent neural network

KW - multi-task learning

KW - temporal attention

UR - http://www.scopus.com/inward/record.url?scp=85111068582&partnerID=8YFLogxK

U2 - 10.1109/JSEN.2021.3098325

DO - 10.1109/JSEN.2021.3098325

M3 - Article

AN - SCOPUS:85111068582

SN - 1530-437X

VL - 22

SP - 17483

EP - 17489

JO - IEEE Sensors Journal

JF - IEEE Sensors Journal

IS - 18

ER -

A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this