TY - GEN
T1 - Dynamic facial models for video-based dimensional affect estimation
AU - Song, Siyang
AU - Sanchez-Lozano, Enrique
AU - Tellamekala, Mani Kumar
AU - Shen, Linlin
AU - Johnston, Alan
AU - Valstar, Michel
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Dimensional affect estimation from a face video is a challenging task, mainly due to the large number of possible facial displays made up of a set of behaviour primitives including facial muscle actions. The displays vary not only in composition but also in temporal evolution, with each display composed of behaviour primitives with varying in their short and long-term characteristics. Most existing work models affect relies on complex hierarchical recurrent models unable to capture short-term dynamics well. In this paper, we propose to encode these short-term facial shape and appearance dynamics in an image, where only the semantic meaningful information is encoded into the dynamic face images. We also propose binary dynamic facial masks to remove 'stable pixels' from the dynamic images. This process allows filtering of non-dynamic information, i.e. only pixels that have changed in the sequence are retained. Then, the final proposed Dynamic Facial Model (DFM) encodes both filtered facial appearance and shape dynamics of a image sequence preceding to the given frame into a three-channel raster image. A CNN-RNN architecture is tasked with modelling primarily the long-term changes. Experiments show that our dynamic face images achieved superior performance over the standard RGB face images on dimensional affect prediction task.
AB - Dimensional affect estimation from a face video is a challenging task, mainly due to the large number of possible facial displays made up of a set of behaviour primitives including facial muscle actions. The displays vary not only in composition but also in temporal evolution, with each display composed of behaviour primitives with varying in their short and long-term characteristics. Most existing work models affect relies on complex hierarchical recurrent models unable to capture short-term dynamics well. In this paper, we propose to encode these short-term facial shape and appearance dynamics in an image, where only the semantic meaningful information is encoded into the dynamic face images. We also propose binary dynamic facial masks to remove 'stable pixels' from the dynamic images. This process allows filtering of non-dynamic information, i.e. only pixels that have changed in the sequence are retained. Then, the final proposed Dynamic Facial Model (DFM) encodes both filtered facial appearance and shape dynamics of a image sequence preceding to the given frame into a three-channel raster image. A CNN-RNN architecture is tasked with modelling primarily the long-term changes. Experiments show that our dynamic face images achieved superior performance over the standard RGB face images on dimensional affect prediction task.
KW - Deep learning
KW - Dimensional affect estimation
KW - Facial dynamic modelling
UR - http://www.scopus.com/inward/record.url?scp=85082497480&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2019.00200
DO - 10.1109/ICCVW.2019.00200
M3 - Conference contribution
AN - SCOPUS:85082497480
T3 - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
SP - 1608
EP - 1617
BT - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Y2 - 27 October 2019 through 28 October 2019
ER -