PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

1Tsinghua University 2Nanjing University 3Institute of Automation, CAS
TPAMI 2023
1 / 20
2 / 20
3 / 20
4 / 20
5 / 20
6 / 20
7 / 20
8 / 20
9 / 20
10 / 20
11 / 20
12 / 20
13 / 20
14 / 20
15 / 20
16 / 20
17 / 20
18 / 20

Reconstruction results of PyMAF-X on the COCO validation set

Abstract

We present PyMAF-X, a regression-based approach to recovering parametric full-body models from monocular images. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results.

Videos

Demo (Frame by frame reconstruction. No post-processing.)

Video trimmed from here
Video trimmed from here

Comparison with the state of the art

Face-only Reconstruction
More Comparisons on the NoW benchmark

References

[SMPL-X] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, Expressive Body Capture: 3D Hands, Face, and Body from a Single Image, in CVPR 2019
[ExPose] V. Choutas, G. Pavlakos, T. Bolkart, D. Tzionas, and M. J. Black, Monocular Expressive Body Regression through Body-driven Attention, in ECCV 2020
[FrankMocap] Y. Rong, T. Shiratori, and H. Joo, FrankMocap: A monocular 3D whole-body pose estimation system via regression and integration, in ICCVW, 2021
[PIXIE] Y. Feng, V. Choutas, T. Bolkart, D. Tzionas, and M. J. Black, “Collaborative regression of expressive bodies using moderation,” in 3DV, 2021
[Hand4Whole] G. Moon, H. Choi, and K. M. Lee, Accurate 3D Hand Pose Estimation for Whole-body 3D Human Mesh Estimation, in CVPRW, 2022

BibTeX


@article{pymafx2023,
  title={PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images},
  author={Zhang, Hongwen and Tian, Yating and Zhang, Yuxiang and Li, Mengcheng and An, Liang and Sun, Zhenan and Liu, Yebin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023}
}

@inproceedings{pymaf2021,
  title={PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop},
  author={Zhang, Hongwen and Tian, Yating and Zhou, Xinchi and Ouyang, Wanli and Liu, Yebin and Wang, Limin and Sun, Zhenan},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}