IEEE ICCV 2021

Lightweight Multi-person Total Motion Capture
Using Sparse Multi-view Cameras

 

Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu*, Yebin Liu* (* - corresponding author)

Department of Automation and BNRist, Tsinghua University

 

 

Fig 1. Our lightweight total capture system produces expressive human models with sparse multi-view cameras..

 

 

Abstract

Marker-less motion capture, due to its great potentials for behaviour understanding, sports analysis, human animation, video editing and virtual reality, has been a popular research topic in computer vision and graphics for decades. Within this research field, total motion capture(joo2018total) using an extremely dense-view setup (hundreds of cameras), shows impressive results of simultaneous capture of multi-person total interactive behaviours including facial expressions, body and hand poses, and has aroused widespread interest in computer vision community. However, this work suffers from expensive and sophisticated hardware setup and low run-time efficiency.


Overview

 

 

Fig 2. Method overview. Initially, we take multi-view RGB sequences and body estimation results as our inputs. Skeletons of each individuals are constructed by 4D association. After that, we utilize our limb bootstrapping framework to localize and associate body part. After that, we optimize parametric SMPL-X models from all these outputs. Finally, our feedback mechanism is introduced to boost the body association performance in next frame with the reconstructed human model.

 

 

 

Fig 3.Illustration of hand association algorithm.

 

 


Results

 

 

Fig 5. Results by our system. From the left to right are input reference images, parametric model alignment, facial and hand alignment and 3D visualization from a novel view, respectively. (a) Results of the hand-object-interaction case from our captured data using 6 views, (b) results of a multi-person-interaction scenario using 6 views, (c) results on CMU dataset using 8 views.

 

 

Fig 6. Qualitative evaluation of hand bootstrapping & comparison against FrankMocap. (a) Results of Frankmocap, only single ROI are extracted for each view, and left hand (blue rectangle) and right hand (green rectangle) have been distributed to the same ROI proposal. (b) Results of our method, all hands are extracted and associated correctly.

 


Datasets

 

Technical Paper

 

 


Demo Video

 


Citation

Yuxiang Zhang and Zhe Li and Liang An and Mengcheng Li and Tao Yu and Yebin Liu. "Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras". IEEE ICCV 2021

 

@inproceedings{lightcap2021,
title={Light-weight Multi-person Total Capture Using Sparse Multi-view Cameras},
author={Zhang, Yuxiang and Li, Zhe and An, Liang and Li, Mengcheng and Yu, Tao and Liu, Yebin},
year={2021},
booktitle={IEEE International Conference on Computer Vision}
}