Learning Neural Parametric Head Models

CVPR 2023

Simon Giebenhain¹, Tobias Kirschstein¹, Markos Georgopoulos², Martin Rünz², Lourdes Agapito³, Matthias Nießner¹

¹Technical University of Munich, ²Synthesia ³University College London

We propose to learn a neural parametric head model based on neural fields: first, we capture a large dataset of over 2000 high-fidelity head scans with varying shapes and expressions (left). We then non-rigidly register these scans to generate our training data. As a result of training, we obtain a disentangled latent that spans the space of shapes $\mathbf{z}_{id}$ and expressions $\mathbf{z}_{ex}$ (middle). At inference time, we can leverage the prior of our learned representation by fitting our model to a sparse input point cloud by solving for the latent codes (right).

Press R to reset views.

Abstract

We propose a novel 3D morphable model (3DMM) for complete human heads based on hybrid neural fields.
At the core of our model lies a neural parametric representation which disentangles identity and expressions in a disjoint latent space. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF) and model facial expressions with a neural deformation field.
In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points.
To facilitate generalization, we train our model on a newly-captured dataset of over 2000 face scans from 120 different identities using a custom high-end 3D scanning set-up. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M faces per scan.
Finally, we demonstrate that our approach outperforms state-of-the-art methods by a significant margin in terms of fitting error and reconstruction quality.

Video

Latent Shape Interpolation

Here is an interactive viewer allowing to interpolate between for identities. Drag the blue cursor around to change latent identity description $\mathbf{z}_{\text{id}}$ and observe the reulting change of geometry on the right.

Latent Shape Coordinates
(Quadrilateral linear interpolation between 4 cornering identites.)

Resulting geometry in canonical expression.

Latent Expression Interpolation

Here is an interactive viewer allowing for interpolations between four different expressions. Drag the blue cursor around to change $\mathbf{z}_{\text{ex}}$, which deforms a fixed identity on the right.

Latent Expression Coordinates
(Quadrilateral linear interpolation between 4 cornering expressions.)

Deformed Neutral Mesh

Expression Transfer

Futhermore, the disentangled structure of NPHM allows us to transfer expressions from one subject to others.
Here we show one subject acting as actor and apply its expressions $\mathbf{z}_{ex}$ to 3 other test set subjects.
The neutral expression shows each subject in their repsecive neutral expression.

Source/Actor

Target 1

Target 2

Target 3

[Select expressions on the left. Press R to reset view. ]

Method Overview

1. We capture a dataset of 124 identities in 20 different expressions. The neutral expression has an open mouth to avoid topological issues.

2. We directly train our identity network on the raw neutral scans.

3. Building on 2D facial landmark detectors, we non-rigidly register a common template against all scans.

4. We estimate the ground truth deformation fields between expression using the registrations and directly supervise our expression network.

5. Our identity network represents shapes implicitly using an ensemble of local MLPs, each described by an individual latent vector. We incorporate a symmetry prior by mirroring the local coordinate systems of symmetric MLPs and share their parameters.

Dataset

The animation below displays 4 identities from our dataset, performing all 20 expressions. In total our dataset contains 124 identities and more than 2200 scans in total.
Our scans capture a high level of detail and are very complete, including good capture of hair. On average, each mesh has 3.5M faces. (Note that we added 3 additional expressions for our latest 50 scanning subjects.)

BibTeX


@inproceedings{giebenhain2023nphm,
 author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and  Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
 title={Learning Neural Parametric Head Models},
 booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
 year = {2023}}