Abstract
Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.
Original language | English |
---|---|
Pages (from-to) | 17483-17489 |
Number of pages | 7 |
Journal | IEEE Sensors Journal |
Volume | 22 |
Issue number | 18 |
DOIs | |
Publication status | Published - 15 Sept 2022 |
Externally published | Yes |
Keywords
- Sound event detection
- acoustic scene classification
- convolutional recurrent neural network
- multi-task learning
- temporal attention
ASJC Scopus subject areas
- Instrumentation
- Electrical and Electronic Engineering