User video summarization based on joint visual and semantic affinity graph

Zhuo Lei, Ke Sun, Qian Zhang, Guoping Qiu

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

Automatically generating summaries of user-generated videos is very useful but challenging. User-generated videos are unedited and usually only contain a long single shot which makes traditional video temporal segmentation methods such as shot boundary detection ineffective in producing meaningful video segments for summarization. To address this issue, we propose a novel temporal segmentation framework based on the clustering of joint visual and semantic affinity graph of the video frames. Based on a pre-trained deep convolutional neural network (CNN), we extract deep visual features of the frames to construct the visual affinity graph. We then construct the semantic affinity graph of the frames based on word embedding of the frames' semantic tags generated from an automatic image tagging algorithm. A dense neighbor method is then used to cluster the joint visual and semantic affinity graph to divide the video into subshot level segments and from which a summary of the video can be generated. Experimental results show that our approach outperforms state-of-the-art methods. Furthermore, we show that the method achieves results that are similar to those performed manually.

Original languageEnglish
Title of host publicationIv and L-MM 2016 - Proceedings of the 2016 ACM Workshop on Vision and Language Integration Meets Multimedia Fusion, co-located with ACM Multimedia 2016
PublisherAssociation for Computing Machinery, Inc
Pages45-52
Number of pages8
ISBN (Electronic)9781450345194
DOIs
Publication statusPublished - 16 Oct 2016
Event2016 ACM Workshop on Vision and Language Integration Meets Multimedia Fusion, Iv and L-MM 2016 - Amsterdam, Netherlands
Duration: 16 Oct 2016 → …

Publication series

NameIv and L-MM 2016 - Proceedings of the 2016 ACM Workshop on Vision and Language Integration Meets Multimedia Fusion, co-located with ACM Multimedia 2016

Conference

Conference2016 ACM Workshop on Vision and Language Integration Meets Multimedia Fusion, Iv and L-MM 2016
Country/TerritoryNetherlands
CityAmsterdam
Period16/10/16 → …

Keywords

  • Clustering
  • Joint affinity graph
  • User-generated video
  • Video summarization
  • Video temporal segmentation

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'User video summarization based on joint visual and semantic affinity graph'. Together they form a unique fingerprint.

Cite this