SIGGRAPH 2023

LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar

 

Yuelang Xu1, Hongwen Zhang1, Lizhen Wang12, Xiaochen Zhao12, Han Huang3, Guojun Qi3, Yebin Liu1

1Tsinghua University 2NNKosmos 3OPPO Research

 

Abstract

Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons.

 

Fig 1.We propose LatentAvatar, an expressive neural head avatar driven by latent expression codes. LatentAvatar is able to capture subtle expressions such as pouting (left) and perform expressive reenactment (right) between different subjects.

 

[arXiv] [Code]

 


Latent Head NeRF

 

Fig 2. Overview of the Latent Head NeRF. Given a portrait video, we first encode the face image to the latent expression code \(\theta\), which is used as a condition to generate the tri-plane features. Given a 3D position, the feature vector \(H\) is extracted from the tri-plane features for the volume rendering of the low-resolution image and feature map. finally, a super-resolution network is used to generate the corresponding high-resolution images.

 


Cross-identity Reenactment

 

Fig 3. Illustration of the Y-shape network (left) and the mapping MLP (right). In the Y-shape network, the shared encoder \(E_{shared}\) encodes the input face images as the shared expression latent code, which will be decoded by the avatar and actor decoders \(D_{ava}, D_{act}\) to the face images of the avatar and the actor individually. To bridge the shared and person-specific latent space, the mapping MLP learns to map the shared latent expression code to the person-specific one.

 

Fig 4. The process of the cross-identity reenactment in our method. The face image of the actor is first fed into the shared encoder to obtain the shared latent code \(\gamma\), which is mapped as the person-specific latent code \(\theta\) to drive the NeRF-based head avatar.

 


Results

 

 

Fig 5. Qualitative comparisons of different methods on the self reenactment task. From left to right: IMavatar, NeRFace, Coeff+Tri-plane, and Ours. Our method surpasses other methods in the ability to capture and reproduce detailed expressions such as the wrinkles around the nose and the exposure level of teeth.

 

 

Fig 6. Qualitative comparisons of different methods on the cross-identity reenactment task. From left to right: IMavata, NeRFace, Coeff+Tri-plane baseline and Ours. Our method can accurately transfer eye movement and tooth grinning and remain robust in some exaggerated expressions.

 

 

Fig 7. More cross-identity reenactment results.

 

 

 

Fig 8. Novel view synthesis.

 

 

 

Fig 9. Multi-view avatar reconstruction.

 

 


Demo Video

 


Citation

@InProceedings{xu2023latentavatar,
title={LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar},
author={Xu, Yuelang and Zhang, Hongwen and Wang, Lizhen and Zhao, Xiaochen and Han, Huang and Guojun, Qi and Liu, Yebin},
booktitle={ACM SIGGRAPH 2023 Conference Proceedings},
pages={},
year={2023}
}