Multiple Appropriate Facial Reaction Generation
Multiple Appropriate Facial Reaction Generation using Transformer Variational Auto-encoder
Introduction
The Multiple Appropriate Facial Reaction Generation (MAFRG) is a new task in Generative Machine Learning domain. In a conversation, given the audio and visual clues from the speaker and a reference image of the listener, a MAFRG model generates multiple appropriate listner’s reactions in the form of sequences of well-defined facial features including the action units, valence-arousal, emotion probability, and 3DMM features. Then, why do we need multiple reactions. According to Song et al., given the same input stimuli (speaker verbal and non-verbal input),different listeners may have different reactions and even the same listener may have different reactions in different contexts. MAFRG is a one-to-many problem and has a set of requirements and corresponding evaluation metrics well-defined by the authors proposing this task.
Competitions
Researchers from University of Cambridge and Monash University have conducted REACT 2023 competition to encourage teams in the community to develop ML/DL model to accomplish MAFRG task on their benchmark (RECOLA+NOXI datasets). The competition is a satellite event of ACM MM 2023. In 2024, they continue organizing the REACT 2024 competition associating with the IEEE Facial Gesture Conference 2024.
Our methods
I am glad that I got the 3rd place in the 20243 competition. We achieved an outstanding Facial Reaction Distance compared to the baseline and other teams’ methods. In our method, we use Gaussian Mixture of Models to improve the ability of generating complex distribution of the Transformer-based Variational Auto-encoder (TransVAE). We also utilize the Multimodal Bottleneck Transformer to improve the speaker multimodal feature extraction. The detailed architecture of our proposed method is illustrated in below figures. Our paper is released here in the conference booklet.
Results
Model | FRCorr | FRDist | FRDiv | FRVar | FRDvs | FRRea | FRSyn |
---|---|---|---|---|---|---|---|
GT | 8.73 | 0.00 | 0.0000 | 0.0724 | 0.2483 | 53.96 | 47.69 |
TransVAE | 0.07 | 90.31 | 0.0064 | 0.0012 | 0.0009 | 69.19 | 44.65 |
BeLFusion | 0.12 | 91.45 | 0.0112 | 0.0082 | 0.0120 | - | 44.89 |
Ours | 0.03 | 11.68 | 0.0000 | 0.1006 | 0.1960 | 51.28 | 45.29 |