TY - GEN
T1 - Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features
AU - Song, Siyang
AU - Shen, Linlin
AU - Valstar, Michel
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/6/5
Y1 - 2018/6/5
N2 - Depression is a serious mental disorder that affects millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and need extensive participation of experts. Audio-visual automatic depression analysis systems predominantly base their predictions on very brief sequential segments, sometimes as little as one frame. Such data contains much redundant information, causes a high computational load, and negatively affects the detection accuracy. Final decision making at the sequence level is then based on the fusion of frame or segment level predictions. However, this approach loses longer term behavioural correlations, as the behaviours themselves are abstracted away by the frame-level predictions. We propose to on the one hand use automatically detected human behaviour primitives such as Gaze directions, Facial action units (AU), etc. as low-dimensional multi-channel time series data, which can then be used to create two sequence descriptors. The first calculates the sequence-level statistics of the behaviour primitives and the second casts the problem as a Convolutional Neural Network problem operating on a spectral representation of the multichannel behaviour signals. The results of depression detection (binary classification) and severity estimation (regression) experiments conducted on the AVEC 2016 DAIC-WOZ database show that both methods achieved significant improvement compared to the previous state of the art in terms of the depression severity estimation.
AB - Depression is a serious mental disorder that affects millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and need extensive participation of experts. Audio-visual automatic depression analysis systems predominantly base their predictions on very brief sequential segments, sometimes as little as one frame. Such data contains much redundant information, causes a high computational load, and negatively affects the detection accuracy. Final decision making at the sequence level is then based on the fusion of frame or segment level predictions. However, this approach loses longer term behavioural correlations, as the behaviours themselves are abstracted away by the frame-level predictions. We propose to on the one hand use automatically detected human behaviour primitives such as Gaze directions, Facial action units (AU), etc. as low-dimensional multi-channel time series data, which can then be used to create two sequence descriptors. The first calculates the sequence-level statistics of the behaviour primitives and the second casts the problem as a Convolutional Neural Network problem operating on a spectral representation of the multichannel behaviour signals. The results of depression detection (binary classification) and severity estimation (regression) experiments conducted on the AVEC 2016 DAIC-WOZ database show that both methods achieved significant improvement compared to the previous state of the art in terms of the depression severity estimation.
KW - Automatic depression analysis
KW - Convolution Nerual Networks
KW - Human behaviour signals
KW - Spectrum maps
KW - Statistic
UR - http://www.scopus.com/inward/record.url?scp=85049389756&partnerID=8YFLogxK
U2 - 10.1109/FG.2018.00032
DO - 10.1109/FG.2018.00032
M3 - Conference contribution
AN - SCOPUS:85049389756
T3 - Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018
SP - 158
EP - 165
BT - Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018
Y2 - 15 May 2018 through 19 May 2018
ER -