[CVPR 2024] EFHQ: Multi-purpose ExtremePose-Face-HQ dataset

VinAI Research, Vietnam
*Indicates Equal Contribution
MY ALT TEXT

Benefits of our proposed dataset (EFHQ). Standard large-scale facial datasets have most images at near frontal views, causing inferior performance of trained models on downstream tasks when dealing with extreme head poses. For instance, the trained 2D image generators and text-to-image ones often produce only near frontal faces, while the 3D face generators and face reenactment methods often show distorted outputs at profile views. The recently proposed dataset LPFF partially handles that issue by providing complementary images at extreme head poses for only 2D and 3D image generation tasks. Our proposed dataset EFHQ provides high-quality extreme-pose images to complement a wide range of face-related tasks. It supports 2D and 3D image generation, with generally better diversity than LPFF. EFHQ also helps correct the outputs of text-to-image generation and face reenactment at extreme views. Finally, EFHQ provides a more challenging pose-based face verification benchmark to better assess the quality of face recognition networks.

Abstract

The existing facial datasets, while having plentiful images at near frontal views, lack images with extreme head poses, leading to the downgraded performance of deep learning models when dealing with profile or pitched faces. This work aims to address this gap by introducing a novel dataset named Extreme Pose Face High-Quality Dataset (EFHQ), which includes a maximum of 450k high-quality images of faces at extreme poses. To produce such a massive dataset, we utilize a novel and meticulous dataset processing pipeline to curate two publicly available datasets, VFHQ and CelebV-HQ, which contain many high-resolution face videos captured in various settings. Our dataset can complement existing datasets on various facial-related tasks, such as facial synthesis with 2D/3D-aware GAN, diffusion-based text-to-image face generation, and face reenactment. Specifically, training with EFHQ helps models generalize well across diverse poses, significantly improving performance in scenarios involving extreme views, confirmed by extensive experiments. Additionally, we utilize EFHQ to define a challenging cross-view face verification benchmark, in which the performance of SOTA face recognition models drops 5-37% compared to frontal-to-frontal scenarios, aiming to stimulate studies on face recognition under severe pose conditions in the wild.

2D GAN-based Face Generation

3D-aware GAN-based Face Generation

Comparison between multiview generated samples, with truncation ψ=0.8, of EG3D model trained with various datasets.
Top: FFHQ, Middle: FFHQ+LPFF, Bottom: FFHQ+EFHQ.

Face Reenactment

BibTeX

@inproceedings{dao2024efhq,
  title={EFHQ: Multi-purpose ExtremePose-Face-HQ dataset}, 
  author={Trung Tuan Dao and Duc Hong Vu and Cuong Pham and Anh Tran},
  year={2024},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}