In:
The Journal of Supercomputing, Springer Science and Business Media LLC, Vol. 77, No. 10 ( 2021-10), p. 10773-10790
Abstract:
In this study, we present a fusion model for emotion recognition based on visual data. The proposed model uses video information as its input and generates emotion labels for each video sample. Based on the video data, we first choose the most significant face regions with the use of a face detection and selection step. Subsequently, we employ three CNN-based architectures to extract the high-level features of the face image sequence. Furthermore, we adjusted one additional module for each CNN-based architecture to capture the sequential information of the entire video dataset. The combination of the three CNN-based models in a late-fusion-based approach yields a competitive result when compared to the baseline approach while using two public datasets: AFEW 2016 and SAVEE.
Type of Medium:
Online Resource
ISSN:
0920-8542
,
1573-0484
DOI:
10.1007/s11227-021-03690-y
Language:
English
Publisher:
Springer Science and Business Media LLC
Publication Date:
2021
detail.hit.zdb_id:
1479917-0
Permalink