Addressee detection using facial and audio features in mixed human–human and human–robot settings: a deep learning framework

Fiseha Berhanu Tesema, Jason Gu, Wei Song, Hong Wu, Shiqiang Zhu, Zheyuan Lin, Min Huang, Wen Wang, Rajesh Kumar

Research output: Journal PublicationArticlepeer-review

Abstract

Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.
Original languageEnglish
Article number22959594
Pages (from-to)25-38
JournalIEEE Systems, Man, and Cybernetics Magazine
Volume9
Issue number2
DOIs
Publication statusPublished - 18 Apr 2023

Keywords

  • Deep learning
  • Visualization
  • Annotations
  • Input variables
  • Human-robot interaction
  • Oral communication
  • Predictive models

Fingerprint

Dive into the research topics of 'Addressee detection using facial and audio features in mixed human–human and human–robot settings: a deep learning framework'. Together they form a unique fingerprint.

Cite this