Confidence-Based Event-Centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Weikai Kong, Shuhong Ye, Chenglin Yao, Jianfeng Ren

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)


Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O2VQA). It requires an online state-updating mechanism for the solver to decide if the collected information is sufficient to conclude an answer. We then propose a Confidence-based Event-centric Online Video Question Answering (CEO-VQA) model to solve this problem. Furthermore, a dataset called Answer Target in Background Stream (ATBS) is constructed to evaluate this newly developed online VideoQA application. Compared to the baseline VideoQA method that watches the whole video, the experimental results show that the proposed method achieves a significant performance gain.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
Publication statusPublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149


Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
CityRhodes Island


  • Online Video Question Answering
  • Open-ended VideoQA
  • Video Understanding
  • VideoQA

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Confidence-Based Event-Centric Online Video Question Answering on a Newly Constructed ATBS Dataset'. Together they form a unique fingerprint.

Cite this