Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition

Zhenbing Liu, Zeya Li, Ruili Wang, Ming Zong, Wanting Ji

Research output: Journal PublicationArticlepeer-review

29 Citations (Scopus)

Abstract

Human action recognition is a process of labeling video frames with action labels. It is a challenging research topic since the background of videos is usually chaotic, which will reduce the performance of traditional human action recognition methods. In this paper, we propose a novel spatiotemporal saliency-based multi-stream ResNets (STS), which combines three streams (i.e., a spatial stream, a temporal stream and a spatiotemporal saliency stream) for human action recognition. Further, we propose a novel spatiotemporal saliency-based multi-stream ResNets with attention-aware long short-term memory (STS-ALSTM) network. The proposed STS-ALSTM model combines deep convolutional neural network (CNN) feature extractors with three attention-aware LSTMs to capture the temporal long-term dependency relationships between consecutive video frames, optical flow frames or spatiotemporal saliency frames. Experimental results on UCF-101 and HMDB-51 datasets demonstrate that our proposed STS method and STS-ALSTM model obtain competitive performance compared with the state-of-the-art methods.

Original languageEnglish
Pages (from-to)14593-14602
Number of pages10
JournalNeural Computing and Applications
Volume32
Issue number18
DOIs
Publication statusPublished - 1 Sept 2020
Externally publishedYes

Keywords

  • Action recognition
  • Attention-aware
  • LSTM
  • Multi-stream
  • Spatiotemporal saliency

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition'. Together they form a unique fingerprint.

Cite this