Abstract
Abnormal behavior detection in surveillance videos is necessary for public monitoring and safety. In human-based surveillance systems, it requires continuous human attention and observation, which is a difficult task. The autonomous detection of such events is of essential significance. However, due to the scarcity of labeled data and the low occurrence probability of these events, abnormal event detection is a challenging vision problem. In this paper, we introduce a novel two-stage architecture for detecting anomalous behavior in videos. In the first stage, we propose a 3D Convolutional Autoencoder (3D-CAE) architecture to extract spatio-temporal features from normal event training videos. In 3D-CAE, the encoder and decoder architectures are based on 3D convolutions, which can learn both appearance and the motion features effectively in an unsupervised manner. In the second stage, we group the 3D spatio-temporal features into different normality clusters, and then remove the sparse clusters to represent a stronger pattern of normality. From these clusters, one-class SVM classifier is used to distinguish between normal and abnormal events based on the normality scores. Experimental results on four different benchmark datasets show significant performance improvement compared to state-of-the-art approaches while providing results in real-time.
Original language | English |
---|---|
Article number | 103047 |
Journal | Journal of Visual Communication and Image Representation |
Volume | 75 |
DOIs | |
Publication status | Published - Feb 2021 |
Externally published | Yes |
Keywords
- 3D-CAE
- Anomaly detection
- Autonomous video surveillance
- Spatiotemporal latent features
- Video analysis
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering