Facial Expression Recognition by De-expression Residue Learning阅读笔记

发表于 2020-01-06 更新于 2022-02-13 分类于人脸表情识别阅读次数： Disqus：

预览

这是一篇CVPR2018关于表情识别的论文，作者可谓是独辟蹊径，从“De-expression”这一角度进行表情识别的研究。作者通过一些事实和文献发现，人的表情可以分解为Neutral Compoent和Expressive Componet两部分。作者的想法是将人脸经过一个GAN网络得到一张与之对应的中性表情，然后对residue(残余特征)进行训练学习，进一步进行表情分类。

原文地址

原文地址没有相应的源代码:sweat:

Challenge

现在的大部分研究关注的都是光照，姿态，遮挡等对表情识别的影响。作者关注的是个体差异像年龄，性别，种族背景等因素对表情识别的影响（the current main challenge comes from the large variations of individuals in attributes such as: age, gender, ethnic background and personality.）

Inspriation

people are capable of recognizing facial expressions by comparing a subject’s expression with a reference expression (i.e., neutral expression) of the same subject[1].
a facial expression can be decomposed to an expressive component and neutral component[2] 人们可以通过一个参考表情来识别其它表情（这里参考表情用的是中性表情）；一个人脸表情可以分为中性部分和表情部分。如图所示。

Network architecture

网络结构大体可分为两部分：GAN(Generator)用来生成中性表情，并保存有residue，用于训练学习；第二部分是学习残余特征，然后进行表情分类。网络结构有很多细节没有体现，比如説学习残余特征的网络结构还有5个损失函数都没详细说明，论文中也没有细说。这里主要是学习的是它分解表情的思想。

Generator

cGAN[3]被用来从一个给定的图片生成一个中性人脸表情。 cGAN训练的输入是一个图像对$$，而生成器的输出为$I_{output}$。其中$I_{target}$是图片的中性表情的ground truth。$I_{output}$输出的是GAN生成的中性表情 The discriminator tries to distinguish the $< I_{input}, I_{target} >$ from the $< I_{input}, I_{output} >$ 判别器是为了尽力区分输出表情与ground truth之间的差别 the generator tries to not only maximally confuse the discriminator but also generate an image as close to the target image as possible. 而生成器则是尽可能使输出与ground truth足够接近，进而混淆视听。 Generator的目标函数 Discriminator的目标函数 cGAN的目标函数