Multiple Appropriate Facial Reaction Generation using Transformer Variational Auto-encoder

Introduction

The Multiple Appropriate Facial Reaction Generation (MAFRG) is a new task in Generative Machine Learning domain. In a conversation, given the audio and visual clues from the speaker and a reference image of the listener, a MAFRG model generates multiple appropriate listner’s reactions in the form of sequences of well-defined facial features including the action units, valence-arousal, emotion probability, and 3DMM features. Then, why do we need multiple reactions. According to Song et al., given the same input stimuli (speaker verbal and non-verbal input),different listeners may have different reactions and even the same listener may have different reactions in different contexts. MAFRG is a one-to-many problem and has a set of requirements and corresponding evaluation metrics well-defined by the authors proposing this task.

mafrg

Competitions

Researchers from University of Cambridge and Monash University have conducted REACT 2023 competition to encourage teams in the community to develop ML/DL model to accomplish MAFRG task on their benchmark (RECOLA+NOXI datasets). The competition is a satellite event of ACM MM 2023. In 2024, they continue organizing the REACT 2024 competition associating with the IEEE Facial Gesture Conference 2024.

Cert

Our methods

I am glad that I got the 3rd place in the 20243 competition. We achieved an outstanding Facial Reaction Distance compared to the baseline and other teams’ methods. In our method, we use Gaussian Mixture of Models to improve the ability of generating complex distribution of the Transformer-based Variational Auto-encoder (TransVAE). We also utilize the Multimodal Bottleneck Transformer to improve the speaker multimodal feature extraction. The detailed architecture of our proposed method is illustrated in below figures. Our paper is released here in the conference booklet.

Model

Model

Results

Model FRCorr FRDist FRDiv FRVar FRDvs FRRea FRSyn
GT 8.73 0.00 0.0000 0.0724 0.2483 53.96 47.69
TransVAE 0.07 90.31 0.0064 0.0012 0.0009 69.19 44.65
BeLFusion 0.12 91.45 0.0112 0.0082 0.0120 - 44.89
Ours 0.03 11.68 0.0000 0.1006 0.1960 51.28 45.29