Learning Neural Parametric Head Models

1Technical University of Munich, 2Synthesia 3University College London

We propose to learn a neural parametric head model based on neural fields: first, we capture a large dataset of over 2000 high-fidelity head scans with varying shapes and expressions (left). We then non-rigidly register these scans to generate our training data. As a result of training, we obtain a disentangled latent that spans the space of shapes $\mathbf{z}_{id}$ and expressions $\mathbf{z}_{ex}$ (middle). At inference time, we can leverage the prior of our learned representation by fitting our model to a sparse input point cloud by solving for the latent codes (right).

Press R to reset views.

Abstract

We propose a novel 3D morphable model (3DMM) for complete human heads based on hybrid neural fields.
At the core of our model lies a neural parametric representation which disentangles identity and expressions in a disjoint latent space. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF) and model facial expressions with a neural deformation field.
In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points.
To facilitate generalization, we train our model on a newly-captured dataset of over 2000 face scans from 120 different identities using a custom high-end 3D scanning set-up. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M faces per scan.
Finally, we demonstrate that our approach outperforms state-of-the-art methods by a significant margin in terms of fitting error and reconstruction quality.

Video

Latent Shape Interpolation

Here is an interactive viewer allowing to interpolate between for identities. Drag the blue cursor around to change latent identity description $\mathbf{z}_{\text{id}}$ and observe the reulting change of geometry on the right.

Latent Shape Coordinates
(Quadrilateral linear interpolation between 4 cornering identites.)
Resulting geometry in canonical expression.

Latent Expression Interpolation

Here is an interactive viewer allowing for interpolations between four different expressions. Drag the blue cursor around to change $\mathbf{z}_{\text{ex}}$, which deforms a fixed identity on the right.

Latent Expression Coordinates
(Quadrilateral linear interpolation between 4 cornering expressions.)
Deformed Neutral Mesh

Expression Transfer

Futhermore, the disentangled structure of NPHM allows us to transfer expressions from one subject to others.
Here we show one subject acting as actor and apply its expressions $\mathbf{z}_{ex}$ to 3 other test set subjects.
The neutral expression shows each subject in their repsecive neutral expression.

Source/Actor
Target 1
Target 2
Target 3
[Select expressions on the left. Press R to reset view. ]

Method Overview

1. We capture a dataset of 124 identities in 20 different expressions. The neutral expression has an open mouth to avoid topological issues.

2. We directly train our identity network on the raw neutral scans.

3. Building on 2D facial landmark detectors, we non-rigidly register a common template against all scans.

4. We estimate the ground truth deformation fields between expression using the registrations and directly supervise our expression network.

5. Our identity network represents shapes implicitly using an ensemble of local MLPs, each described by an individual latent vector. We incorporate a symmetry prior by mirroring the local coordinate systems of symmetric MLPs and share their parameters.

Dataset

The animation below displays 4 identities from our dataset, performing all 20 expressions. In total our dataset contains 124 identities and more than 2200 scans in total.
Our scans capture a high level of detail and are very complete, including good capture of hair. On average, each mesh has 3.5M faces. (Note that we added 3 additional expressions for our latest 50 scanning subjects.)

Related Links

For more work on similar tasks, please check out the following papers.

NPMs learns a parametric model for human bodies, and SPAMs extends the idea to a part based representation (torso, legs, arms, head).

i3DMMs and ImFace both learn a neural-field-based 3DMM without requiring non-rigid registration.

Convolutionl Occupancy Networks explore regular (2D planes and 3D grids) local conditioning for implicit shape representation. AIR-Net explores attention operators for locally conditioned implicit representations, which allows for less regular structure. Local Deep Implicit Funcions also a use a flexible, point-based conditioning with Gaussian influence.

BibTeX


@inproceedings{giebenhain2023nphm,
 author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and  Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
 title={Learning Neural Parametric Head Models},
 booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
 year = {2023}}