Abstract
Human action recognition is a process of labeling video frames with action labels. It is a challenging research topic since the background of videos is usually chaotic, which will reduce the performance of traditional human action recognition methods. In this paper, we propose a novel spatiotemporal saliency-based multi-stream ResNets (STS), which combines three streams (i.e., a spatial stream, a temporal stream and a spatiotemporal saliency stream) for human action recognition. Further, we propose a novel spatiotemporal saliency-based multi-stream ResNets with attention-aware long short-term memory (STS-ALSTM) network. The proposed STS-ALSTM model combines deep convolutional neural network (CNN) feature extractors with three attention-aware LSTMs to capture the temporal long-term dependency relationships between consecutive video frames, optical flow frames or spatiotemporal saliency frames. Experimental results on UCF-101 and HMDB-51 datasets demonstrate that our proposed STS method and STS-ALSTM model obtain competitive performance compared with the state-of-the-art methods.
Original language | English |
---|---|
Pages (from-to) | 14593-14602 |
Number of pages | 10 |
Journal | Neural Computing and Applications |
Volume | 32 |
Issue number | 18 |
DOIs | |
Publication status | Published - 1 Sept 2020 |
Externally published | Yes |
Keywords
- Action recognition
- Attention-aware
- LSTM
- Multi-stream
- Spatiotemporal saliency
ASJC Scopus subject areas
- Software
- Artificial Intelligence